CN110781912A - Image classification method based on channel expansion inverse convolution neural network - Google Patents

Image classification method based on channel expansion inverse convolution neural network Download PDF

Info

Publication number
CN110781912A
CN110781912A CN201910852719.0A CN201910852719A CN110781912A CN 110781912 A CN110781912 A CN 110781912A CN 201910852719 A CN201910852719 A CN 201910852719A CN 110781912 A CN110781912 A CN 110781912A
Authority
CN
China
Prior art keywords
image
convolution
output
size
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910852719.0A
Other languages
Chinese (zh)
Inventor
李娇杰
张萌
李国庆
吕锋
段斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201910852719.0A priority Critical patent/CN110781912A/en
Publication of CN110781912A publication Critical patent/CN110781912A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image classification method based on a channel expansion inverse convolution neural network, which comprises the following steps of (1) generating a training image set and a testing image set from a large-scale image data set; (2) performing standard convolution operation on the images in the training image set, and outputting a characteristic image; (3) performing convolution operation on the characteristic image output in the step (2) by adopting a DPDN (digital pre-distortion network) convolution inversion block; (4) performing global average pooling on the feature image output in the step (3) to obtain a 1 × 1 output feature image; (5) and (4) enabling the output characteristic image with the size of 1 multiplied by 1 obtained in the step (4) to pass through a full connection layer, and finally accessing a normalization index function layer to finish the classification of the training image, so as to obtain the DPDN convolutional neural network model. And (5) inputting the images in the test image set into the DPDNet convolutional neural network model obtained in the steps (2) - (5) to finish image classification. The test result of the invention is obviously improved.

Description

Image classification method based on channel expansion inverse convolution neural network
Technical Field
The invention relates to the technical field of image processing, and designs an image classification method based on a channel expansion inverse convolution neural network.
Background
Convolutional Neural Networks (CNNs) are developed from artificial Neural networks, and are becoming more popular in many image processing related tasks, such as application to image classification, face detection, and the like. CNN has great advantages over traditional methods based on artificial feature selection, especially in large data volume analysis tasks.
Since AlexNet won ILSVARC-2012 champion, deep convolutional neural networks have advanced performance of multiple computer vision tasks to a new height. The general trend is to build deeper and more complex networks such as VGG, google lenet, ResNet, densneet, and resenext, etc. for higher accuracy, but these networks do not necessarily meet the mobile device requirements in terms of computation and speed. Therefore, reducing computational complexity is of great significance in neural networks. Currently, the efficient lightweight CNN architecture is receiving more attention.
In recent years, researchers have shown great interest in lightweight and efficient networks. The method can be mainly divided into three modes, namely model Pruning (Network Pruning), Data Quantization (Data Quantization) and depth separable convolution (depth separable convolution).
Pruning, model pruning, is derived from reducing the parameter quantity of the neural network and overfitting. The pruned CNNs are more sparse, so that the network parameter quantity can be reduced and the pressure of a computer memory can be reduced. Model pruning, as a method that can reduce the storage and computation required by neural networks, is widely used in network model compression.
Data quantization, the convolutional neural network can be compressed using fixed point data, which can also ensure good network accuracy. For example, Korean et al further reduce memory storage without any loss of precision by 8-bit fixed point weight quantization. The CNN network is firstly trained by using floating point numbers, for each convolution layer, an optimal quantization scheme of each layer is searched by analyzing statistical data of a characteristic diagram and network parameters, after quantization of all layers is completed, the accuracy is further improved by Fine tuning, and then the Fine-tuned floating point numbers are converted into fixed point numbers according to the previous quantization scheme. The data quantization is suitable for all calculations of the convolutional neural network, and the method is simple, has small calculation amount and is widely applied to network model compression.
The depth separable convolution decomposes the standard convolution into depth convolutions (Depthwise convolution) and then applies a 1 × 1 convolution kernel to the standard convolution. The deep convolution can extract the spatial information of the input feature map, and the 1 × 1 convolution can combine the features of all channels. Xception utilizes a depth separable convolution and trains ImageNet data sets, resulting in higher image classification accuracy. MobileNet achieves better results with fewer parameters in a lightweight convolutional neural network by using deep separable convolutions. The invention mainly uses deep convolution to construct an inverted convolution neural network structure with channel expansion to further improve the parameter efficiency.
Disclosure of Invention
The purpose of the invention is as follows: the image classification method based on the channel expansion inverted convolutional neural network optimizes the convolutional neural network structure, reduces network parameters and calculation cost, is more favorable for obtaining the spatial characteristics of the image, and improves the accuracy of image classification test.
The invention discloses an image classification method based on a channel expansion inverse convolution neural network, which expands the channel number of an input image by using 3 x 3 depth convolution (Depthwise convolution), further optimizes and improves the image classification method, and provides a DPDN (DPDNet convolution inverse block), namely, the input image firstly constructs the spatial characteristics of the image by the 3 x 3 depth convolution layer in a channel number expansion mode, then constructs the channel information of the image by the channel number of a 1 x 1 standard convolution layer compressed image, and finally further extracts the spatial characteristics of a characteristic diagram by the 3 x 3 depth convolution layer.
The technical scheme is as follows: in order to realize the purpose, the invention adopts the following technical scheme:
an image classification method based on a channel expansion inverse convolution neural network comprises the following steps:
(1) generating a training image set and a testing image set from the large-scale image data set;
(2) performing standard convolution operation on the images in the training image set, and outputting a characteristic image;
(3) performing convolution operation on the characteristic image output in the step (2) by adopting a DPDN (digital pre-distortion network) convolution inversion block;
(4) performing global average pooling on the feature image output in the step (3) to obtain a 1 × 1 output feature image;
(5) enabling the output characteristic image with the size of 1 multiplied by 1 obtained in the step (4) to pass through a full connection layer, and finally accessing a normalization index function layer to finish the classification of the training image, so as to obtain a DPDN (digital pre-distortion network) convolutional neural network model;
(6) and (5) inputting the images in the test image set into the DPDNet convolutional neural network model obtained in the steps (2) - (5) to finish image classification.
Further, the image size in the training image set and the test image set in the step (1) is n × n, wherein n is larger than or equal to 8.
Further, the convolution kernel size of the standard convolution operation in the step (2) is r multiplied by r, wherein r is more than or equal to 3, and the step length is s 0Wherein s is 0The number of channels of the output image is 4m, wherein m is more than or equal to 1, and the size of the output image is n/s 0×n/s 0
Further, the step (3) comprises the following steps:
(3-1) sequentially passing the output image obtained in the step (2) through N 1Each DPDN net convolution inversion block outputs 2m, 3m, 4m, … (N) 1+1) m, the convolutional layer step size of the expanded channel in the last DPDN convolutional inverse block is s 1The step length of the rest DPDN convolution inversion blocks is 1, and the size of the final output image is n/(s) 0·s 1)×n/(s 0·s 1);
(3-2) sequentially passing the output image obtained in the step (3-1) through N 2Each DPDNNet convolution inversion block outputs 8m, 12m, 20m, … and 2 channels of images N2M, convolution layer step size s for the expanded channel in the penultimate convolution inversion block 2And the rest of the convolution is reversedThe block setting step length is 1, and the size of the final output image is n/(s) 0·s 1·s 2)×n/(s 0·s 1·s 2)。
Furthermore, the method for generating the DPDNet convolution inversion block includes:
(a) performing convolution operation on an input image by adopting depth convolution, expanding the number of channels of the input image, acquiring the spatial characteristics of the input image by expanding the number of channels of the image, and then performing batch normalization and nonlinear activation operation on the acquired spatial characteristics of the input image;
the size of the input image is h multiplied by w, the number of channels of the input image is k, the size of a convolution kernel is 3 multiplied by 3, the step length is s, s is larger than or equal to 1, the number of channels of the input image is expanded by m times through the depth convolution operation, and the dimensionality of the output image is changed into
Figure BDA0002197338770000031
(b) Performing standard convolution operation on the image after the depth convolution, compressing the number of channels of the input image, constructing the feature distribution of the image by calculating the linear combination of each channel of the input image, and then performing batch normalization and nonlinear activation operation on the feature distribution of the image;
the standard convolution operation has a convolution kernel size of 1 × 1 and a step size of 1, and compresses the number of channels of the input image so that the number of channels of the output image is k ', k'<m.k, the dimension of the output image becomes
(c) Performing convolution operation on the standard convolved image by adopting depth convolution, wherein the number of channels of the output image is the same as that of input channels, and further acquiring the spatial information of the image; then, carrying out batch normalization and nonlinear activation operation on the output image, and outputting a DPDNet convolution inversion block;
the convolution kernel size of the deep convolution operation is 3 multiplied by 3, the step size is 1, the number of channels of the output image is the same as that of the input image, so the dimension of the output image is still
Figure BDA0002197338770000033
Wherein, the expression of batch normalization is as follows:
Figure BDA0002197338770000034
wherein, y iFor the ith output feature image, x iFor the ith input feature image, μ βFor training the mean value, sigma, of the pixels of an image set βFor training the variance of the pixel points of the image set, β is the input image set of a batch, epsilon is a positive integer, gamma and delta are parameters to be trained;
the activation function formula for the nonlinear activation operation is: y ═ max (0, x) (2);
wherein y is the output feature image and x is the input feature image.
Further, the global average pooling operation in the step (4) is to sum all pixel points of the input feature image with the size of h × w, and then divide by (h × w) to obtain the output feature image with the image size of 1 × 1.
Further, in the step (5), the output characteristic image with the size of 1 × 1 obtained in the step (4) passes through a full connection layer, z nodes are output, z is the total class number of the image data set, and the output value of each node is x iI is more than or equal to 1 and less than or equal to z, i represents that the input image belongs to the ith class of the image set, namely each node corresponds to a class, and the value x of z nodes is obtained iOutputting z probability values P by normalizing the index function layer iMaximum probability value P iThe corresponding category is the category corresponding to the input image, and a DPDN convolutional neural network model is obtained.
Wherein, the expression of the normalized exponential function is as follows:
Figure BDA0002197338770000041
has the advantages that: compared with the prior art, the image classification method based on the channel expansion inverse convolution neural network expands the number of channels of an input image by using 3 x 3 deep convolution (depthwise convolution), further optimizes and improves the number of channels, and obtains a more refined inverse structure called as a DPDN (digital pre-distortion network) convolution inverse block. According to the invention, an efficient and simple DPDN network structure is obtained by stacking DPDN convolution inversion blocks, compared with other common convolution neural networks, the spatial features of an input image are extracted by using the depth convolution of a 3 x 3 convolution kernel, network parameters are reduced, the problem of overlarge floating point number operation amount of the convolution neural network is further improved, the spatial features of the image are more favorably obtained, and the accuracy rate is obviously improved in the classification test of a large-scale image set.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a flow chart of steps (2) - (5) of the method of the present invention;
fig. 3 is a flowchart of a DPDNet convolution inversion block generation method.
Detailed Description
The technical solution of the present invention is described in detail below with reference to the accompanying drawings and specific embodiments.
The deep convolution is similar to the standard convolution operation and can be used for extracting the characteristic information of the image. However, the number of parameters and the operation cost used by the deep convolution operation are low compared to the standard convolution operation. For example, for an input image with k input channels and w height and w width of h, the depth convolution operation process is to separate each channel of the input image and set a convolution kernel for each channel, so that k convolution kernels are needed, each convolution kernel performs convolution operation with the corresponding channel, k channels can be output and obtained, and finally the k channels are combined to obtain an image with dimension w × h × k, parameters needed by the depth convolution operation are obviously reduced compared with the standard convolution, and compared with the 1 × 1 standard convolution, the spatial features of the image are extracted.
As shown in fig. 1, an image classification method based on a channel expansion inverse convolution neural network includes the following steps:
(1) generating a training image set and a testing image set according to the large-scale image data set;
the images in the training image set and the test image set have a size of n × n (n ≧ 8).
(2) Performing standard convolution operation on the images in the training image set, and outputting a characteristic image;
the convolution kernel size of the standard convolution operation is r multiplied by r (r is more than or equal to 3), and the step length is s 0(s 0Not less than 1), the number of channels of the output image is 4m (m not less than 1), and the size of the output image is n/s 0×n/s 0
(3) Performing convolution operation on the characteristic image output in the step (2) by adopting a DPDN (digital pre-distortion network) convolution inversion block;
(3-1) sequentially passing the output image obtained in the step (2) through N 1Each DPDNet convolution inversion block shown in fig. 2 has channels of 2m, 3m, 4m, …, (N) respectively 1+1) m, the convolutional layer step size of the expanded channel in the last DPDN convolutional inverse block is s 1And the step size of the rest DPDN convolution inversion blocks is 1, so that the size of the output image through the step (3-1) is n/(s) 0·s 1)×n/(s 0·s 1);
(3-2) sequentially passing the output image obtained in the step (3-1) through N 2Each DPDNNet convolution inversion block outputs 8m, 12m, 20m, … and 2 channels of images N2M, the convolution layer step size of the expanded channel in the penultimate DPDN convolution inversion block is s 2And the step size of the rest DPDN convolution inversion blocks is 1, so that the size of the output image after the step (3-2) is n/(s) 0·s 1·s 2)×n/(s 0·s 1·s 2)。
As shown in FIG. 3, the DPDN convolution inversion block acquisition method used in the invention firstly adopts depth convolution to perform depth convolution operation on an input image with dimension h × w × k, the convolution kernel size is 3 × 3, the step length is s (s ≧ 1), the number of channels of the input image is expanded by m times through the depth convolution operation, and the dimension of the output image is changed into the dimension of h × w × k
Figure BDA0002197338770000051
Then proceed withAnd carrying out batch normalization processing, and carrying out activation processing by using a nonlinear function. Then, standard convolution operation is carried out on the input image, the size of a convolution kernel is 1 multiplied by 1, and the number of channels of the input image is compressed, so that the number of channels of the output image is reduced to k '(k'<m.k), the output image becomes Then, batch normalization processing is performed, and activation processing is performed by using a nonlinear function. Performing convolution operation on the standard convolved image by adopting depth convolution, wherein the size of the depth convolution is 3 multiplied by 3, the step length is 1, and the number of channels of the output image is the same as that of the input channels, so that the dimensionality of the output image is still
Figure BDA0002197338770000061
Similarly, the input layer is subjected to batch normalization processing and nonlinear function activation processing in sequence to obtain an output image. The above steps construct a convolution block, which is called as DPDNet convolution inversion block. As shown in fig. 2, it is a schematic diagram of the variation of the output image size and the number of channels of each convolution layer in a DPDNet convolution inversion block, which is proposed by the present invention, and the specific steps are as follows:
(a) performing convolution processing on an input image with dimension h multiplied by w multiplied by k by adopting depth convolution operation, wherein the size of a convolution kernel is 3 multiplied by 3, the step length is s (s is more than or equal to 1), expanding the number of channels of the input image by m times through the depth convolution operation, and changing the dimension of an output image into Then, performing batch normalization processing, wherein the expression of the batch normalization processing is shown in formula (1), and performing activation processing by using a nonlinear function, wherein the activation function of the nonlinear activation processing is shown in formula (2):
Figure BDA0002197338770000063
wherein, y iFor the ith output feature image, x iFor the ith input feature image, μ βFor training image pixelsPoint mean, σ βFor training the variance of the pixel points of the image set, β is the input image set of a batch, epsilon is a positive integer, gamma and delta are parameters to be trained;
y=max(0,x) (2);
wherein y is an output characteristic image, and x is an input characteristic image;
(b) the image after the depth convolution operation is subjected to standard convolution operation, the convolution kernel size is 1 × 1, and the number of channels of the input image is compressed so that the number of channels of the output image becomes k '(k'<m · k), the dimension of the output image becomes
Figure BDA0002197338770000064
Then, carrying out batch normalization processing by adopting a formula (1), and carrying out activation processing by using a nonlinear function of a formula (2);
(c) performing convolution operation on the standard convolved image by using depth convolution, wherein the size of the depth convolution is 3 multiplied by 3, the step length is 1, and the number of channels of the output image is the same as that of the input image, so that the dimensionality of the output image is still the same
Figure BDA0002197338770000065
Then, the batch normalization process is performed by using formula (1), and the activation process is performed by using the nonlinear function of formula (2).
Replacing 1 multiplied by 1 standard convolution operation in a common network structure with 3 multiplied by 3 convolution kernel deep convolution operation in the steps (a) and (c), wherein for the convolution operation with an input channel of k and an output channel of m.k (m is more than or equal to 1), the parameter required by the 1 multiplied by 1 standard convolution operation is m.k 2And the parameter required by the deep convolution operation is 9m · k, the ratio is k/9, and k is generally a positive integer greater than 9, so in the image classification method based on the convolutional neural network, the parameter can be reduced by using the deep convolution operation, and the spatial information of the image can be acquired more favorably.
(4) The feature image output in the step (3) is subjected to a global average pooling layer to obtain a 1 × 1 output feature image, and the global average pooling operation is to sum all pixel points of the input feature image with the size of h × w and then divide by (h × w) to obtain the output feature image with the image size of 1 × 1;
(5) passing the output characteristic image with the size of 1 multiplied by 1 obtained in the step (4) through a full connection layer, outputting z nodes, wherein z is the total category number of the image data set, and the output value of each node is x i(1 ≦ i ≦ z), i indicates that the image input in step (2) belongs to the ith category of the image set, i.e. each node corresponds to a category. Then the values x of the z nodes are compared iOutputting z probability values P through a normalized exponential function (softmax) layer iMaximum probability value P iThe corresponding category is the category corresponding to the input image in the step (2), so that a DPDNet convolutional neural network model is obtained, and the softmax function expression is as follows:
Figure BDA0002197338770000071
the flow charts of the above steps (2) - (5) are shown in fig. 2.
(6) And (5) in the testing stage, inputting the test image into the DPDN convolutional neural network model obtained after the processing of the steps (2) to (5), completing image classification, and obtaining a test result of the classification accuracy of the test image data set.
Example (b):
the DPDN convolutional neural network provided by the invention is composed of DPD convolutional inversion blocks. The invention uses a TensorFlow deep learning neural network framework to build a proposed DPDN (digital Pre-distortion network) convolution network structure and trains two large-scale image data sets of CIFAR-10 and CIFAR-100. The CIFAR-10 image set is composed of 60000 32 x 32 color images of 10 categories, and is divided into 50000 training images and 10000 test images, wherein each category has 6000 images, and each category is divided into 5000 training images and 1000 test images; the CIFAR-100 image set consists of 60000 32X 32 color images of 100 classes, which are divided into 50000 training images and 10000 test images, each class has 600 images, and each class is divided into 500 training images and 100 test images.
Referring to table 1, the DPDNet convolutional neural network structure provided by the present invention specifically operates as follows, in the training stage, firstly, the size of training images input in batches is 32 × 32, the number of channels is 3, in stage 1, standard convolutional operation is used, the size of a convolutional kernel is 3 × 3, the step length is 1, the number of channels of output images is 32, and the output size is 32 × 32; in the 2 nd stage, after passing through 3 DPDNet convolution inversion blocks shown in fig. 2, the input image sequentially passes through 3 DPDNet convolution inversion blocks, the number of channels of the output image of each DPDNet convolution inversion block is 16, 24, 32, the step size of the first two DPDNet convolution inversion blocks is 1, the step size of the convolution layer of the expansion channel in the third DPDNet convolution inversion block is 2, so the size of the output image passing through the 2 nd stage is 16 × 16; in the 3 rd stage, 3 DPDN (digital Pre-distortion network) convolution inversion blocks exist, an input image sequentially passes through the 3 DPDN convolution inversion blocks, the number of channels of an output image of each DPDN convolution inversion block is respectively 64, 96 and 160, the step length of expanding a channel convolution layer in the second DPDN convolution block is 2, and therefore the size of the output image passing through the 3 rd stage is 8 multiplied by 8; step 4, after passing through a global average pooling layer, the size of the output characteristic image is reduced to 1 multiplied by 1, the number of channels of the output image is still 160, and the number of output channels of the final full-connection layer of the CIFAR-10 image set is 10 and corresponds to 10 categories of the CIFAR-10 image set; and finally, accessing a normalized exponential function (Softmax) layer to finish image classification, wherein the number of output channels of the full connection layer of the CIFAR-100 image set is 100, and the output channels correspond to 100 categories of the CIFAR-100 image set. And after 164 rounds of training, obtaining the connection weight and the bias value of the DPDN. In the testing stage, a testing image is input into a DPDN convolutional neural network model obtained after 164 rounds of training, and a testing result of the classification accuracy of the image data set is obtained;
table 1 DPDNet convolutional neural network proposed by the present invention
Figure BDA0002197338770000081
According to the DPDN convolutional neural network structure constructed by the image classification method based on the channel expansion inverse convolutional neural network, the CIFAR-10 image data set and the CIFAR-100 image data set are trained by changing the expansion channel parameter m, and after 164 rounds of training, the connection weight and the offset value of the DPDN convolutional neural network are obtained. In the testing stage, a test image is input into a DPDN convolutional neural network obtained after 164 rounds of training, so that a test result of the classification accuracy of the image data set is obtained as shown in table 2, and compared with a common MobileNet V2 network structure, as shown in table 2, the DPDN convolutional neural network structure provided by the invention uses fewer parameters compared with the common MobileNet V2 network structure as the expansion channel parameter m is increased; in the CIFAR-10 image data set, the MobileNet V2 image classification test accuracy rate is higher when only m is 4, and other image classification test accuracy rates of the DPDN network structure provided by the invention are higher, which shows that the image classification method based on the channel expansion inverse convolution neural network provided by the invention is more effective, and proves that the network structure provided by the invention can be efficiently applied to the classification problem of large-scale images.
TABLE 2 network model comparison table for image classification accuracy test results
Figure BDA0002197338770000091
The above is only a preferred embodiment of the present invention, it should be noted that the above embodiment does not limit the present invention, and various changes and modifications made by workers within the scope of the technical idea of the present invention fall within the protection scope of the present invention.
According to the invention, a high-efficiency and simple DPDN network structure is obtained by stacking DPDN convolution inversion blocks, compared with other common convolution neural networks, the deep convolution of a 3 x 3 convolution kernel is used for extracting the spatial characteristics of an input image, so that the network parameters are reduced, the problem of overlarge floating point number operation amount of the convolution neural network is further improved, and the test accuracy is obviously improved in the classification problem of large-scale image sets.

Claims (9)

1. An image classification method based on a channel expansion inverse convolution neural network is characterized by comprising the following steps:
(1) generating a training image set and a testing image set from the large-scale image data set;
(2) performing standard convolution operation on the images in the training image set, and outputting a characteristic image;
(3) performing convolution operation on the characteristic image output in the step (2) by adopting a DPDN (digital pre-distortion network) convolution inversion block;
(4) performing global average pooling on the feature image output in the step (3) to obtain a 1 × 1 output feature image;
(5) enabling the output characteristic image with the size of 1 multiplied by 1 obtained in the step (4) to pass through a full connection layer, and finally accessing a normalization index function layer to finish the classification of the training image, so as to obtain a DPDN (digital pre-distortion network) convolutional neural network model;
(6) and (5) inputting the images in the test image set into the DPDNet convolutional neural network model obtained in the steps (2) - (5) to finish image classification.
2. The method for classifying the image based on the channel expansion inverse convolutional neural network of claim 1, wherein the size of the images in the training image set and the test image set in step (1) is n x n, wherein n is greater than or equal to 8.
3. The image classification method based on the channel expansion inverse convolution neural network as claimed in claim 1, wherein the convolution kernel size of the standard convolution operation in step (2) is r x r, wherein r ≧ 3, the step size is s 0Wherein s is 0The number of channels of the output image is 4m, wherein m is more than or equal to 1, and the size of the output image is n/s 0×n/s 0
4. The image classification method based on the channel expansion inverse convolutional neural network as claimed in claim 1, wherein the step (3) comprises the following steps:
(3-1) sequentially passing the output image obtained in the step (2) through N 1Each DPDN net convolution inversion block outputs 2m, 3m, 4m, … (N) 1+1) m, the convolutional layer step size of the expanded channel in the last DPDN convolutional inverse block iss 1The step length of the rest DPDN convolution inversion blocks is 1, and the size of the final output image is n/(s) 0·s 1)×n/(s 0·s 1);
(3-2) sequentially passing the output image obtained in the step (3-1) through N 2Each DPDNNet convolution inversion block outputs 8m, 12m, 20m, … and 2 channels of images N2M, convolution layer step size s for the expanded channel in the penultimate convolution inversion block 2The step length of the rest convolution inversion blocks is 1, and the size of the final output image is n/(s) 0·s 1·s 2)×n/(s 0·s 1·s 2)。
5. The image classification method based on the channel expansion inverse convolutional neural network as claimed in any one of claims 1 or 4, wherein the generation method of the DPDN convolutional inverse block is as follows:
(a) performing convolution operation on an input image by adopting depth convolution, expanding the number of channels of the input image, acquiring the spatial characteristics of the input image by expanding the number of channels of the image, and then performing batch normalization and nonlinear activation operation on the acquired spatial characteristics of the input image;
the size of the input image is h multiplied by w, the number of channels of the input image is k, the size of a convolution kernel is 3 multiplied by 3, the step length is s, s is larger than or equal to 1, the number of channels of the input image is expanded by m times through the depth convolution operation, and the dimensionality of the output image is changed into
Figure FDA0002197338760000021
(b) Performing standard convolution operation on the image after the depth convolution, compressing the number of channels of the input image, constructing the feature distribution of the image by calculating the linear combination of each channel of the input image, and then performing batch normalization and nonlinear activation operation on the feature distribution of the image;
the standard convolution operation has a convolution kernel size of 1 × 1 and a step size of 1, and compresses the number of channels of the input image so that the number of channels of the output image is k ', k'<m.k, dimension of output imageDegree is changed into
Figure FDA0002197338760000022
(c) Performing convolution operation on the standard convolved image by adopting depth convolution, wherein the number of channels of the output image is the same as that of input channels, and further acquiring the spatial information of the image; then, carrying out batch normalization and nonlinear activation operation on the output image, and outputting a DPDNet convolution inversion block;
the convolution kernel size of the deep convolution operation is 3 multiplied by 3, the step size is 1, the number of channels of the output image is the same as that of the input image, so the dimension of the output image is still
Figure FDA0002197338760000023
6. The image classification method based on the channel expansion inverse convolution neural network as claimed in claim 5, wherein the expression of batch normalization is:
Figure FDA0002197338760000024
wherein, y iFor the ith output feature image, x iFor the ith input feature image, μ βFor training the mean value, sigma, of the pixels of an image set βThe variance of pixel points of the image set is trained, wherein epsilon is a positive integer, and gamma and delta are parameters needing to be trained;
the activation function formula for the nonlinear activation operation is: y ═ max (0, x);
wherein y is the output feature image and x is the input feature image.
7. The method as claimed in claim 1, wherein the global average pooling operation in step (4) is to sum all pixel points of the input feature image with the size of h × w and then divide by (h × w) to obtain the output feature image with the image size of 1 × 1.
8. The image classification method based on the channel expansion inverse convolutional neural network as claimed in claim 1, wherein in step (5), the output feature image with the size of 1 × 1 obtained in step (4) is passed through a full connection layer, z nodes are output, z is the total class number of the image data set, and the output value of each node is x iI is more than or equal to 1 and less than or equal to z, i represents that the input image belongs to the ith class of the image set, namely each node corresponds to a class, and the value x of z nodes is obtained iOutputting z probability values P by normalizing the index function layer iMaximum probability value P iThe corresponding category is the category corresponding to the input image, and a DPDN convolutional neural network model is obtained.
9. The image classification method based on the channel expansion inverse convolution neural network as claimed in claim 8, wherein the normalized exponential function expression is:
CN201910852719.0A 2019-09-10 2019-09-10 Image classification method based on channel expansion inverse convolution neural network Pending CN110781912A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910852719.0A CN110781912A (en) 2019-09-10 2019-09-10 Image classification method based on channel expansion inverse convolution neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910852719.0A CN110781912A (en) 2019-09-10 2019-09-10 Image classification method based on channel expansion inverse convolution neural network

Publications (1)

Publication Number Publication Date
CN110781912A true CN110781912A (en) 2020-02-11

Family

ID=69384191

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910852719.0A Pending CN110781912A (en) 2019-09-10 2019-09-10 Image classification method based on channel expansion inverse convolution neural network

Country Status (1)

Country Link
CN (1) CN110781912A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112001431A (en) * 2020-08-11 2020-11-27 天津大学 Efficient image classification method based on comb convolution
CN112287791A (en) * 2020-10-21 2021-01-29 济南浪潮高新科技投资发展有限公司 Intelligent violence and terrorism behavior detection method based on equipment side
CN112950584A (en) * 2021-03-01 2021-06-11 哈尔滨工程大学 Coating surface defect identification method based on deep learning
CN113298843A (en) * 2020-02-24 2021-08-24 中科寒武纪科技股份有限公司 Data quantization processing method and device, electronic equipment and storage medium
CN113343949A (en) * 2021-08-03 2021-09-03 中国航空油料集团有限公司 Pedestrian detection model training method for universal embedded platform
CN113409281A (en) * 2021-06-24 2021-09-17 上海云从企业发展有限公司 Image definition evaluation method and device based on depth model

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596258A (en) * 2018-04-27 2018-09-28 南京邮电大学 A kind of image classification method based on convolutional neural networks random pool
CN109214406A (en) * 2018-05-16 2019-01-15 长沙理工大学 Based on D-MobileNet neural network image classification method
CN110059710A (en) * 2018-01-18 2019-07-26 Aptiv技术有限公司 Device and method for carrying out image classification using convolutional neural networks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110059710A (en) * 2018-01-18 2019-07-26 Aptiv技术有限公司 Device and method for carrying out image classification using convolutional neural networks
CN108596258A (en) * 2018-04-27 2018-09-28 南京邮电大学 A kind of image classification method based on convolutional neural networks random pool
CN109214406A (en) * 2018-05-16 2019-01-15 长沙理工大学 Based on D-MobileNet neural network image classification method

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
GUOQING LI等: "PSDNet and DPDNet: Efficient channel expansion, Depthwise-Pointwise-Depthwise Inverted Bottleneck Block", 《ARXIV:1909.01026V1》 *
何慧敏等: "基于卷积神经网络的行人安全帽自动识别", 《有线电视技术》 *
张邯等: "基于优化的卷积神经网络在交通标志识别中的应用", 《现代电子技术》 *
徐克虎等: "《智能计算方法及其应用》", 31 July 2019, 国防工业出版社 *
高志强等: "《深度学习从入门到实战》", 30 June 2018, 中国铁道出版社 *
龙敏等: "应用卷积神经网络的人脸活体检测算法研究", 《计算机科学与探索》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298843A (en) * 2020-02-24 2021-08-24 中科寒武纪科技股份有限公司 Data quantization processing method and device, electronic equipment and storage medium
CN113298843B (en) * 2020-02-24 2024-05-14 中科寒武纪科技股份有限公司 Data quantization processing method, device, electronic equipment and storage medium
CN112001431A (en) * 2020-08-11 2020-11-27 天津大学 Efficient image classification method based on comb convolution
CN112001431B (en) * 2020-08-11 2022-06-28 天津大学 Efficient image classification method based on comb convolution
CN112287791A (en) * 2020-10-21 2021-01-29 济南浪潮高新科技投资发展有限公司 Intelligent violence and terrorism behavior detection method based on equipment side
CN112950584A (en) * 2021-03-01 2021-06-11 哈尔滨工程大学 Coating surface defect identification method based on deep learning
CN113409281A (en) * 2021-06-24 2021-09-17 上海云从企业发展有限公司 Image definition evaluation method and device based on depth model
CN113343949A (en) * 2021-08-03 2021-09-03 中国航空油料集团有限公司 Pedestrian detection model training method for universal embedded platform

Similar Documents

Publication Publication Date Title
CN110781912A (en) Image classification method based on channel expansion inverse convolution neural network
CN108510012B (en) Target rapid detection method based on multi-scale feature map
CN113326930B (en) Data processing method, neural network training method, related device and equipment
CN110929602A (en) Foundation cloud picture cloud shape identification method based on convolutional neural network
CN111696101A (en) Light-weight solanaceae disease identification method based on SE-Inception
CN112329922A (en) Neural network model compression method and system based on mass spectrum data set
CN111382867A (en) Neural network compression method, data processing method and related device
CN112699899A (en) Hyperspectral image feature extraction method based on generation countermeasure network
CN113642445B (en) Hyperspectral image classification method based on full convolution neural network
CN110909874A (en) Convolution operation optimization method and device of neural network model
CN115829027A (en) Comparative learning-based federated learning sparse training method and system
CN115601751B (en) Fundus image semantic segmentation method based on domain generalization
CN116469100A (en) Dual-band image semantic segmentation method based on Transformer
Ma et al. A unified approximation framework for compressing and accelerating deep neural networks
CN112263224B (en) Medical information processing method based on FPGA edge calculation
Qi et al. Learning low resource consumption cnn through pruning and quantization
CN112308213A (en) Convolutional neural network compression method based on global feature relationship
CN110728352A (en) Large-scale image classification method based on deep convolutional neural network
CN110378466B (en) Neural network difference-based quantization method and system
CN111639751A (en) Non-zero padding training method for binary convolutional neural network
CN112215241A (en) Image feature extraction device based on small sample learning
CN114677545B (en) Lightweight image classification method based on similarity pruning and efficient module
CN110782396A (en) Light-weight image super-resolution reconstruction network and reconstruction method
CN112308215B (en) Intelligent training acceleration method and system based on data sparse characteristic in neural network
CN114549962A (en) Garden plant leaf disease classification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200211

RJ01 Rejection of invention patent application after publication