CN108985316B

CN108985316B - Capsule network image classification and identification method for improving reconstruction network

Info

Publication number: CN108985316B
Application number: CN201810509412.6A
Authority: CN
Inventors: 段书凯; 张金; 邹显丽; 王丽丹; 耿阳阳; 陆春燕
Original assignee: Southwest University
Current assignee: Southwest University
Priority date: 2018-05-24
Filing date: 2018-05-24
Publication date: 2022-03-01
Anticipated expiration: 2038-05-24
Also published as: CN108985316A

Abstract

The invention discloses a capsule network image classification and identification method for improving a reconstruction network, which comprises the following steps: s1, constructing a capsule network; s2, inputting an image training set to the capsule network, and finishing image classification, identification and calibration after training and learning by the capsule network; s3, inputting the image to be classified to the capsule network, and outputting the vector v of the working network_jThe largest one of the median values is the obtained recognition result; s4, the capsule network outputs the recognition result of the image to be classified; wherein the reconstructed network structure of the capsule network is a deconvolution operation. Has the advantages that: a new reconstruction network structure is provided, the vector is restored into an image through deconvolution operation, network parameters are adjusted by comparing errors of the restored image and an original image, the number of calculation parameters is reduced, and more operation memories are left for hardware equipment.

Description

Capsule network image classification and identification method for improving reconstruction network

Technical Field

The invention relates to application of a capsule network in image classification, in particular to a capsule network image classification and identification method for improving a reconstruction network.

Background

In recent years, convolutional neural networks have been developed rapidly in the directions of image recognition, target detection, semantic segmentation, and the like, and generally comprise convolutional layers, active layers, pooling layers and full-link layers, where a pooling layer is an important component in a convolutional neural network, typically maximum pooling and average pooling operations, and can reduce the size of an input feature map and reduce the amount of computation of a model, but the pooling layer also has a problem of position information loss.

Aiming at the problem that position information is lost in a pooling layer in a convolutional neural network, Hinton in 2017 proposes a capsule network (capsnet), the capsule network adopts vectors as input and output, adopts a dynamic routing mechanism to update parameters, can extract position information, can extract more accurate characteristic information relative to the convolutional neural network, and is expected to replace the convolutional neural network structure at the present stage.

However, the existing capsule network design has the defect of large parameter quantity in the aspect of image processing, the model occupies a large memory, and the data volume processed by running hardware at the same time is small.

Disclosure of Invention

In order to solve the problem of large quantity of parameters of the capsule network, the invention provides a new reconstruction network structure for the existing capsule network, the vector is restored into an image through deconvolution operation, and the error of the restored image and the original image is compared to adjust the network parameters, so that the capsule network image classification and identification method for improving the reconstruction network is provided, the quantity of the calculated parameters is reduced, and more operation memories are left for hardware equipment.

In order to achieve the purpose, the invention adopts the following specific technical scheme:

a capsule network image classification and identification method for improving a reconstruction network comprises the following steps:

s1, constructing a capsule network, wherein the capsule network is provided with a working network and a proofreading network, the working network is used for inputting an image and outputting an identification result of the image, and the proofreading network is used for training and adjusting parameters of the working network;

the working network comprises a convolution structure and a full-connection structure, wherein the convolution output end of the convolution structure is connected with the full-connection input end of the full-connection structure, the convolution structure is a convolution layer and a PrimaryCaps layer which are sequentially connected, and the full-connection structure is a network structure which sequentially performs weight calculation, dynamic route adjustment and activation function operation;

the proofreading network comprises a margin Loss operation structure and a reconstruction network structure which are parallel, wherein a Loss input end of the margin Loss operation structure is connected with a full-connection output end of the full-connection structure, a reconstruction input end of the reconstruction network structure is respectively connected with the full-connection output end of the full-connection structure and a vector layer of an input image, a Loss output end of the margin Loss operation structure and a reconstruction output end of the reconstruction network structure are respectively connected with a Loss function input end of a Loss layer, and a Loss function output end of the Loss layer is connected with an optimization function calculation layer;

the reconstruction network structure comprises a Reshape layer, a deconvolution structure, a Flatten layer and a variance calculation layer which are sequentially connected, wherein the variance input end of the variance calculation layer is respectively connected with the Flatten layer and a vector layer of an input image, and the variance output end of the variance calculation layer is connected with the Loss function input end of a Loss layer;

s2, inputting an image training set to the capsule network, and finishing image classification, identification and calibration after training and learning by the capsule network;

s3, inputting the image to be classified to the capsule network, and outputting the vector v of the working network_jThe largest one of the median values is the obtained recognition result;

s4, the capsule network outputs the recognition result of the image to be classified.

The reconstruction network structure of the existing capsule network is a function operation of full connection of a plurality of layers, namely only vector transformation operation, and the operation amount is large, through the design, vector parameters are converted into image parameters for operation through a deconvolution structure, the parameter amount is reduced, but the performance such as actually obtained image processing accuracy is unchanged, and the like, so that the operation hardware of the capsule network can have larger memory margin.

Further, the specific process of the capsule network training and learning of step S2 is as follows:

s2.1, sequentially inputting the images in the image training set into a working network, and obtaining an output vector v after calculation of the working network_j；

S2.2, selecting the output vector v with the maximum parameter value_jInputting a margin loss operation structure and calculating to obtain a deviation value;

s2.3, the output vector v with the maximum parameter value_jInputting a reconstructed network structure and converting the reconstructed network structure into a feature map through a Reshape layer;

s2.4, performing deconvolution operation on the feature map by using a deconvolution structure to obtain a reconstructed image;

s2.5, converting the reconstructed image into a reconstructed vector through a Flatten layer;

s2.6, calculating the reconstructed vector and the input image vector through a sum variance calculation layer to obtain a variance vector;

s2.7, inputting the variance vector and the deviation value obtained in the step S2.2 into a Loss layer to obtain the Loss of the working network;

s2.8, the loss amount is optimized by an optimization function calculation layer and then fed back to a working network;

and S2.9, adjusting parameters of each layer from back to front in a reverse order by the working network until the identification accuracy of the working network is constant, and finishing the training and learning of the capsule network.

Output vector v of working network_jThe method comprises the steps that a plurality of vectors are provided, wherein the largest one of the vectors is an image classification recognition result of a working network, the recognition result has a certain error when the recognition result is not trained, training and learning are to reduce the error towards the correct direction, and finally an accurate recognition result is obtained.

Described further, the input of the margin loss operation structure is the output vector v of the working network_jThe output is sigma_jL_j，L_jThe calculation formula of (a) is as follows:

L_j＝T_jmax(0,m⁺-‖v_j‖)²+λ(1-T_j)max(0,‖v_j‖-m^-)²

the Loss function of the Loss layer is calculated as the output sigma of the margin Loss operation structure_jL_jAnd adding the loss amount to the reconstruction error of the reconstructed network structure to obtain the loss amount.

The actual type of the input image and the recognition result are calculated to obtain the deviation amount in the above manner.

Further described, the deconvolution structure of the reconstructed network structure is 1 deconvolution layer, 1 convolution layer, and 2 deconvolution layers connected in sequence.

Through the alternative operation of convolution and deconvolution, the calculation distortion is reduced, and the error of the deconvolution structure is prevented from being large and the working network parameters are prevented from being adjusted wrongly.

Further describing, the deconvolution layers all adopt deconvolution operations with convolution kernels of 4 × 4 and step size of 2;

the convolutional layer adopts convolution operation with convolution kernel 2 x 2 and step length 1.

Described further, the weight of the full-link structure is calculated as:

the dynamic routing adjustment is:

wherein u is_iInput vector, v, being a fully connected structure_jIn order to output the vector, the vector is,

is a weight vector, W_ijAs weight parameter, b_ijFor dynamic routing parameters, c_ijFor the tuning parameters, k is the number of dynamic routing parameters, s_jThe adjusted intermediate vector is dynamically routed.

Through the design, the full-connection structure also has dynamic routing adjustment capacity, namely the output vector v_jDynamic adjustment to dynamic routing parameters b_ijThereby adjusting the calculation error.

Described further, the function of the activation function operation is:

wherein v is_jAs an output vector, s_jThe adjusted intermediate vector is dynamically routed.

The activation function is a new activation function, and the activation function is added into the calculation of the capsule network, so that the image classification and identification accuracy can be greatly improved.

Described further, the function of the activation function operation is:

The optimization function of the optimization function calculation layer is an Adam function.

The invention has the beneficial effects that: the new reconstruction network structure is provided, the vector is restored into an image through deconvolution operation, network parameters are adjusted by comparing the error of the restored image with the error of the original image, the number of calculation parameters is reduced, and more operation memories are left for hardware equipment; a new activation function is provided, so that the image classification and identification accuracy is greatly improved.

Drawings

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a schematic diagram of the frame structure of a capsule network;

FIG. 3 is a schematic diagram of the training learning steps of the capsule network;

FIG. 4 is a schematic diagram of a capsule network structure according to the first embodiment;

fig. 5 is a schematic diagram of a reconfigurable network structure according to the first embodiment;

FIG. 6 is a diagram of training and testing analysis according to the first embodiment;

FIG. 7 is a schematic diagram of a conventional capsule network reconstruction network architecture;

FIG. 8 is a diagram of a training and testing analysis of a conventional capsule network;

FIG. 9 is a graph comparing the test results of the first embodiment with those of the conventional capsule network;

FIG. 10 is a diagram of capsule network training and test analysis after replacement of a new activation function;

FIG. 11 is a comparison graph of the test effect of the capsule network before and after replacement of the activation function.

Detailed Description

The invention is described in further detail below with reference to the following figures and specific embodiments:

as shown in fig. 1, a capsule network image classification and identification method for improving a reconstruction network comprises the following steps:

s1, constructing a capsule network;

The capsule network is provided with a working network and a proofreading network as shown in fig. 2, wherein the working network is used for inputting an image and outputting an identification result of the image, and the proofreading network is used for training and adjusting parameters of the working network;

preferably, the reconstructed network structure in this embodiment includes a Reshape layer, a deconvolution structure, a scatter layer, and a variance calculation (SSE) layer, which are connected in sequence, where variance input ends of the sum variance calculation layer are respectively connected to the scatter layer and a vector layer of an input image, and a variance output end of the sum variance calculation layer is connected to a Loss function input end of a Loss layer, as shown in fig. 4 and 5;

as shown in fig. 4, the weight calculation of the full-connection structure is preferably:

the dynamic routing adjustment is:

In this embodiment, the function of activating function operation is:

Preferably, the input of the margin loss operation structure is the output vector v of the working network_jThe output is sigma_jL_j，L_jThe calculation formula of (a) is as follows:

L_j＝T_jmax(0,m⁺-‖v_j‖)²+λ(1-T_j)max(0,‖v_j‖-m^-)²

Preferably, the deconvolution structure of the reconstructed network structure in this embodiment is 1 deconvolution layer, 1 convolution layer, and 2 deconvolution layers connected in sequence;

preferably, the deconvolution layers all adopt deconvolution operations with convolution kernels of 4 × 4 and step length of 2;

The optimization function of the optimization function calculation layer is preferably an Adam function.

As shown in fig. 3, the specific process of capsule network training and learning in step S2 is as follows:

The images identified in this embodiment are lung slices, wherein 2526 images contain malignant lung nodules, and 3967 images do not contain lung nodules and lung nodules that are benign, i.e., 6691 total images.

70% of the data set was used as the training set and 30% as the test set. Training set: the lung nodules are not included, the lung nodules are benign 2927, and the lung nodules are malignant 1756; and (3) test set: lung nodules were not included and were 1238 benign, 770 malignant lung nodules.

Fig. 6 is a diagram of analysis of training and testing data using the reconstructed network structure of the present invention, fig. 7 is a diagram of a reconstructed network structure of a conventional capsule network, that is, a reconstructed network structure using full connection operation, fig. 8 is a diagram of analysis of training and testing data thereof, and fig. 9 is a diagram of comparison of testing effects after training of both, which shows that the difference between both is very small.

But the parameter quantity of the two is smaller through the following comparison:

the reconstruction network structure of the invention:

inputting: 2 vectors of length 16;

reshape: a length 16 vector reshape becomes a 4 × 4 feature map: no parameters;

the 4 × 4 feature maps are subjected to deconvolution operation with a convolution kernel of (4,4) and a step size of 2, to obtain 64 feature maps with a size of 8 × 8: parameters 1 × 4 × 64 are 1024;

the 64 8 by 8 signatures are convolved (convolution kernel 2 by 2) to 64 7 by 7 signatures: the parameters 64 × 2 × 64 is 16384;

the 64 7 by 7 signatures were deconvoluted (convolution kernel 4 by 4) to 32 14 by 14 signatures: parameters 64 × 4 × 32 — 32768;

the 32 14 by 14 signatures were deconvoluted (convolution kernel 4 by 4) to 1 28 by 28 signatures: the parameters 32 × 4 × 1 are 512;

1 28 × 28 signature plots (compressed) into vectors of length 784: no parameters;

therefore, the total number of parameters of the new reconstructed network is: 1024+16384+32768+512 are 50688.

The reconstruction network structure of the full connection operation is as follows:

inputting: 2 vectors of length 16;

full connection 1: 1 vector of length 16 to vector of length 512: parameter 16 × 512 — 8192;

full connection 2: length 512 vectors to length 1024 vectors: parameter 512 x 1024 ═ 524288;

full connection 3: length 1024 vectors to length 784 vectors: parameter 1024 × 784 — 802816;

therefore, the total parameters of the original reconstructed network are: 8192+524288+802816 is 1335296.

On the basis of the above scheme, the second embodiment further designs a new activation function, that is, the function of the activation function operation is:

The data analysis of the capsule network training and testing after the new activation function is replaced is shown in fig. 10, and the comparison of the testing effects before and after the activation function is replaced is shown in fig. 11, which obviously shows that the recognition accuracy of the new activation function on the capsule network is greatly improved.

Claims

1. A capsule network image classification and identification method for improving a reconstruction network is characterized by comprising the following steps:

2. The capsule network image classification and identification method of the improved reconstruction network according to claim 1, wherein: the specific process of capsule network training and learning of step S2 is as follows:

3. The capsule network image classification and identification method of the improved reconstruction network according to claim 1 or 2, characterized in that: the input of the margin loss operation structure is an output vector v of the working network_jThe output is sigma_jL_j，L_jThe calculation formula of (a) is as follows:

L_j＝T_jmax(0,m⁺-‖v_j‖)²+λ(1-T_j)max(0,‖v_j‖-m^-)²

4. The capsule network image classification and identification method of the improved reconstruction network according to claim 1, wherein: the deconvolution structure of the reconstruction network structure comprises 1 deconvolution layer, 1 convolution layer and 2 deconvolution layers which are connected in sequence.

5. The capsule network image classification and identification method of the improved reconstruction network according to claim 4, wherein: the deconvolution layers all adopt deconvolution operations with convolution kernels of 4 multiplied by 4 and step length of 2;

6. The capsule network image classification and identification method of the improved reconstruction network according to claim 1, wherein: the weight of the full connection structure is calculated as:

the dynamic routing adjustment is:

7. The capsule network image classification and identification method of the improved reconstruction network according to claim 1, wherein the function of the activation function operation is:

8. The capsule network image classification and identification method of the improved reconstruction network according to claim 1, wherein the function of the activation function operation is:

9. The capsule network image classification and identification method of the improved reconstruction network according to claim 1, wherein: the optimization function of the optimization function calculation layer is an Adam function.