CN114241245A

CN114241245A - Image classification system based on residual error capsule neural network

Info

Publication number: CN114241245A
Application number: CN202111587100.5A
Authority: CN
Inventors: 胡小方; 何鹏; 周跃; 段书凯
Original assignee: Southwest University
Current assignee: Southwest University
Priority date: 2021-12-23
Filing date: 2021-12-23
Publication date: 2022-03-25

Abstract

The invention relates to the technical field of image classification, and particularly discloses an image classification system based on a residual capsule neural network, which is loaded with the residual capsule neural network, wherein the network comprises a first residual capsule module, a second residual capsule module, a third residual capsule module and a digital capsule layer; the first residual capsule module includes a first residual convolution sub-network and a first initial capsule layer, the second residual capsule module includes a second residual convolution sub-network and a second initial capsule layer, and the third residual capsule module includes a third residual convolution sub-network and a third initial capsule layer. Based on the thought of residual learning, the invention realizes multi-level information multiplexing by the way of connecting multi-level residual capsule modules in series, solves the problem that the gradient disappears after the network deepens, and introduces the cavity convolution, thereby being beneficial to extracting more characteristics; a reconstructed network is constructed by using the transposition convolution, so that the network performance is improved while the network parameter quantity is reduced; hardware deployment of the capsule network is realized based on the memristor cross array so as to solve the problem of large calculation amount.

Description

Image classification system based on residual error capsule neural network

Technical Field

The invention relates to the technical field of image classification, in particular to an image classification system based on a residual capsule neural network.

Background

Computer vision based on Convolutional Neural Networks (CNN) has developed rapidly in recent years, and the impact of such a sudden leap has spread to almost all industries, and has been applied in application fields such as driving assistance, image processing, object recognition, and the like. With the intensive research on the convolutional neural network, more and more new CNN structures are proposed, so that the performance of the CNN is remarkably improved. However, CNN still has some disadvantages at this stage, and firstly, CNN is not robust to affine transformation of the target, that is, slight rotation and stretching of the target may cause CNN to obtain an erroneous result. Although the use of data enhancement methods in the model training phase can reduce the occurrence of such problems, it does not make the network robust to any new unknown changes. Second, CNN uses a regularly sliding field to learn the features of a picture, which determines that it will only make decisions based on local features in the input image and will not consider the relative relationship between features. In addition, to reduce the parameters of the deep CNN, a pooling operation is introduced in the CNN, which is sensitive only to important features in the image, thus exacerbating this drawback of CNN.

A new Network architecture, called Capsule Network (CapsNet), overcomes the above-mentioned disadvantages, in which data is transformed from scalar quantities in a simple neural Network into vector form, and these neurons, which are composed of multidimensional vectors, are called capsules. In order to match the transmission and operation of data in capsule neurons among network layers, a routing protocol among capsules is proposed, and the fact that the capsNet has excellent performance on MNIST data sets is proved through experiments. The experimental capsule network is proved to be capable of resisting black box attack and white box attack better than a convolution network, and has stronger robustness. However, since the baseline capsule network model has a simple structure, the performance of the baseline capsule network model under a complex background data set is not very good, and with the continuous and deep research on the capsule network in recent years, many researchers carry out optimization and improvement on the capsule network from the aspects of network structures, routing algorithms and the like. For example, in order to solve the problem of poor performance caused by simple structure of the CapsNet, a researcher replaces the convolution layer of the capsule network with three dense connection sub-networks, so that the extraction capability of the convolution layer on images is enhanced, the new network reduces the iteration times, and a good result is obtained on the image classification problem; researchers deepen the capsule network, put forward the concept of the deep capsule network, and put forward a dynamic routing method suitable for 3D convolution to reduce network parameters, and optimize the network from the dimension of the capsule; researchers have improved the capsule routing algorithm, and proposed a dynamic routing algorithm based on attention mechanism, and implemented an activation function similar to Relu in the capsule network. Meanwhile, researchers expand the application scene of the capsule network, for example, the researchers provide the capsule network suitable for optical remote sensing image processing, images at different visual angles can be processed, and satisfactory performance can be obtained under a smaller training set; researchers have proposed a self-attention capsule network (SACN) that can significantly improve the performance of neural networks on medical data sets; researchers have used capsule networks for lung cancer screening in the medical field and have demonstrated that capsule networks perform better than CNNs under small data sets. Therefore, the research on the end-side deployment scheme of the high-performance capsule network has positive significance for promoting the application of the capsule network in a real scene.

Compared with CNNs, the number of parameters and the amount of computation of a capsule network are larger, but almost all deep learning algorithms are implemented by computers based on von neumann structures, and at present, the bottleneck of von neumann structure computers is that a computing unit and a storage unit are separated from each other, and in the running process of the computers, data transmission between the computing unit and the storage unit consumes a large amount of power and causes a certain time delay, and it is difficult to implement high-efficiency and low-power consumption real-time information processing similar to that of a biological brain, so a deployment scheme of the capsule network on an end side needs to be further studied.

The capsule network has natural white-box attack resistance and is a potential emerging research direction in the field of deep learning, but the baseline capsule network model does not perform well under a complex background data set, and the network is large in calculation amount and not beneficial to being deployed on an end-side device.

Disclosure of Invention

The invention provides an image classification system based on a residual error capsule neural network, which solves the technical problems that: how to realize high-precision image classification based on a capsule network and how to further realize high efficiency and low power consumption of picture classification.

In order to solve the technical problems, the invention provides an image classification system based on a residual capsule neural network, which is loaded with the residual capsule neural network; the residual capsule neural network comprises a first residual capsule module, a second residual capsule module, a third residual capsule module and a digital capsule layer; the first residual capsule module comprises a first residual convolution sub-network and a first initial capsule layer, the second residual capsule module comprises a second residual convolution sub-network and a second initial capsule layer, and the third residual capsule module comprises a third residual convolution sub-network and a third initial capsule layer;

the first residual convolution sub-network performs convolution calculation on an input original image and then outputs a first group of feature maps to the second residual convolution sub-network and the first initial capsule layer, the first initial capsule layer converts the first group of feature maps into a first initial capsule group, and the digital capsule layer converts the first initial capsule group into a first digital capsule group;

the second residual error convolution sub-network performs convolution calculation on the input first group of feature maps and outputs a second group of feature maps to the third residual error convolution sub-network and the second initial capsule layer, the second initial capsule layer converts the second group of feature maps into a second initial capsule group, and the digital capsule layer converts the second initial capsule group into a second digital capsule group;

the third residual convolution sub-network performs convolution calculation on the input second group of feature maps and outputs a third group of feature maps to the third initial capsule layer, the third initial capsule layer converts the third group of feature maps into a third initial capsule group, and the digital capsule layer converts the third initial capsule group into a third digital capsule group;

the digital capsule layer splices the first initial capsule group, the second initial capsule group and the third initial capsule group to obtain a fourth initial capsule group and converts the fourth initial capsule group into a fourth digital capsule group;

and the digital capsule layer splices and fuses four groups of digital capsules in the digital capsule layer and outputs M new digital capsules for target classification and image reconstruction, wherein M is equal to the total classification number of the image classification.

Preferably, the residual capsule neural network further comprises a reconstruction module, and the reconstruction module is configured to generate a reconstructed image with the same size as the original image according to the M new digital capsules.

Preferably, the loss function of the residual capsule neural network is:

wherein L is_totalRepresents the total loss; l is_k＝T_kmax(0,m⁺-||v_k||)²+λ(1-T_k)max(0,||v_k||-m^-)²Represents the edge loss, T, of capsule k in the last capsule layer_kMatching parameters representing class k classification targets corresponding to capsule k, T if and only if the classification is correct_k1, otherwise T_k＝0；v_kRepresents the activation vector, m, of capsule k⁺、m^-λ is capsule vector mode length control parameter, m⁺＝0.9，m^-＝0.1，λ＝0.5；L_rRepresenting a reconstruction loss equal to the mean square error between the pixels of the reconstructed image and the original image; η represents the weight coefficient of the reconstruction loss at the total loss.

Preferably, the first residual convolution sub-network, the second residual convolution sub-network and the third residual convolution sub-network each include two convolution layers and one jump connection layer, and each convolution layer adopts a same convolution introducing a hole convolution; the digital capsule layer converts the initial capsule into the digital capsule through a dynamic routing algorithm and an extrusion activation function.

Preferably, in a convolutional layer of the first, second, or third residual convolutional subnets, for a matrix-vector multiplication operation with h inputs and o outputs, the following formula is used:

in the formula (8), x_nRepresenting the nth element, w, of the input vector_mnRepresents the weight, y, of the m-th row and n-th column in the weight matrix_mRepresenting the mth output in the output vector;

in the hardware design corresponding to the convolutional layer, a first 2D memristor cross array is adopted to realize matrix-vector multiplication operation with h inputs and o outputs;

the first 2D memristive crossbar array comprises h rows and o columns of memristors, x₁～x_hWill convert to the row input voltage V of the first 2D memristive crossbar array_x1～V_xhWeight w_mnMapping as the conductance value of the memristor of the mth row and n columns in the first 2D memristive crossbar array, and input voltage V_xnWith conductance value w of corresponding memristor_mnMultiplying, superposing the output current flowing through each memristor through a lead to obtain column output current, and converting the output current of each column into voltage V through a current-voltage conversion circuit_ymAnd then output.

Preferably, for any initial capsule layer of the first initial capsule layer, the second initial capsule layer and the third initial capsule layer, the initial capsule layer has R capsule units, each capsule unit includes X convolution units, the convolution kernel size is U × U, and the operation of the initial capsule layer is implemented by using a first 3D memristive cross array on the corresponding hardware design;

the first 3D memristor cross array comprises T layers of second 2D memristor cross arrays with structures similar to those of the first 2D memristor cross array, the T layers of second 2D memristor cross arrays correspond to T input feature maps, the specification of memristors of each layer of second 2D memristor cross arrays is (R X) columns and (U) rows, each layer of second 2D memristor cross arrays operates independently, the outputs of different layers of second 2D memristor cross arrays are connected in the same column, the output voltage of each layer of second 2D memristor cross arrays is summed with the outputs of other layers in the direction perpendicular to the column, and the T feature maps are converted into one-dimensional voltage output through the first 3D memristor cross arrays;

after a voltage sequence, the first 3D memristive crossbar array outputs R X one-dimensional voltage signals, representing one X-dimensional vector of R capsule cells; after all voltage sequences are passed by using serial input, the initial capsule layer outputs Y X R dimensional vectors, namely Y X R initial capsules, and Y X X R represents the size of the output characteristic diagram.

Preferably, the digital capsule layer has four sublayers for converting four groups of initial capsules into four corresponding groups of digital capsules; any sub-layer comprises M capsule units, Y X dimensional vectors output by the initial capsule layer and the M capsule units of the sub-layer share Y X R X M weight transformation matrixes, and the weight transformation matrixes are mapped to Y R X independent second 3D memristive cross arrays in the hardware design process for parallel operation;

each second 3D memristive cross array comprises M layers of third 2D memristive cross arrays with structures similar to those of the first 2D memristive cross arrays, the specification of the memristors of each layer of the third 2D memristive cross array is V rows x W columns, the V rows correspond to the input of the V-order tensor, the W columns correspond to the output of the W-order tensor, and finally the sublayer outputs M W-order tensors, namely M digital capsules with dimensions of W.

Preferably, the dimensions of the digital capsules output by the four sub-layers of the digital capsule layer are respectively W₁、W₂、W₃、W₄Then the digital capsule layer is paired with such four groupsSplicing the word capsules to obtain M dimensionalities W₁+W₂+W₃+W₄The new digital capsule of (3) is used for object classification and image reconstruction.

Preferably, the reconstruction module includes a full-connection layer and four transposed convolution layers, each of the transposed convolution layers firstly complements 0 to the input feature graph according to a rule, and then performs convolution operation on the feature graph after the 0 is complemented;

defining the size of input characteristic diagram of a transposed convolution layer as H N, convolution step as S, and the size of each channel of the input characteristic diagram after 0 is supplemented as H_out*H_outThe calculation method is as follows:

H_out＝H+(H-1)×(S-1)+2P+O (11)

the filling rule is as follows: first, insert S-1 0S between every two pixels of the input feature map, then complement P circles of 0 values around the feature map, finally complement O rows of 0 values below the feature map, and complement O columns of 0 values to the right of the feature map.

Preferably, the data set used for training and testing the residual capsule neural network comprises ten thousand or more gray scale images or RGB color images of M-10 types derived from MNIST, Cifar-10 or SVHN data sets.

The image classification system based on the residual error capsule neural network has the beneficial effects that:

1. based on the idea of residual learning, multi-level information multiplexing is realized in a mode of connecting multi-level residual capsule modules in series, the problem that gradient disappears after a network is deepened is solved, and cavity convolution is introduced into a higher-level residual capsule module, so that the receptive field is enlarged, the capsule network is facilitated to extract more features, and the classification precision is improved;

2. the reconstruction network constructed by using the transposition convolution is provided, so that the network performance is improved while the network parameter is reduced;

3. the high-efficiency and low-power-consumption hardware deployment scheme of the capsule network for image classification is provided, and the problem of large calculation amount of a residual capsule neural network is solved by using the advantages of low power consumption and support of parallel matrix calculation of a memristor cross array.

Drawings

FIG. 1 is a block diagram of a baseline capsule network model provided by an embodiment of the present invention;

FIG. 2 is a block diagram of an image classification system based on a residual capsule neural network according to an embodiment of the present invention;

FIG. 3 is a block diagram of a residual capsule module 1 in a residual capsule neural network according to an embodiment of the present invention;

FIG. 4 is a schematic illustration of the hole convolution provided by an embodiment of the present invention;

fig. 5 is a structural diagram of a reconstruction network based on transposed convolution according to an embodiment of the present invention;

FIG. 6 is a block diagram of a first 2D memristive crossbar array provided by an embodiment of the present disclosure;

FIG. 7 is a block diagram of a first 3D memristive crossbar array provided by an embodiment of the present disclosure;

FIG. 8 is a block diagram of a second 3D memristive crossbar array provided by embodiments of the present disclosure;

FIG. 9 is a graph comparing loss of residual capsule modules of MRCapsNet provided by an embodiment of the present invention;

FIG. 10 is a graph comparing the reconstruction errors of MRCapsNet network and CapsNet network under CIFAR-10 data set according to the present invention.

Detailed Description

The embodiments of the present invention will be described in detail below with reference to the accompanying drawings, which are given solely for the purpose of illustration and are not to be construed as limitations of the invention, including the drawings which are incorporated herein by reference and for illustration only and are not to be construed as limitations of the invention, since many variations thereof are possible without departing from the spirit and scope of the invention.

In the capsule neural network, the 'capsule' is a basic element forming the network and is composed of a group of vector neurons, and the input and the output of the capsule neural network are vectors. Wherein the modular length of the vector representing the capsule represents the probability of the classified entity, and each element in the vector represents the characteristic information of the entity. Similar to CNN, when the receptive field of the capsule network is small, the capsule network can only detect lower level attributes, while when the receptive field is large, the CapsNet can acquire more and more complex feature attributes. Fig. 1 shows a simplified construction of CapsNet (baseline capsule network model) consisting of four layers, a convolutional layer, a primary capsule (Primarycaps) layer, a digital capsule (Digitcaps) layer, and a fully connected layer. Data flows in the capsule layer in the form of vectors according to dynamic routing algorithm rules. During training, each low-level capsule outputs a prediction vector to the next-level capsule, and then by comparing the actual labels, if the prediction structure and the actual labels match, the coupling coefficient between the two capsules is increased.

In the capsule dynamic routing algorithm, u is defined_iRepresents the activation vector, u ^ of capsule i located in the lower capsule layer_j|iRepresenting the prediction vector of capsule i in the lower capsule layer to capsule j in the next capsule layer, then:

u_i|j＝W_iju_i (1)

wherein, W_ijA weight transformation matrix is represented for matrix transformation between the part and the whole. Also:

b_ij＝b_ij+u_j|i·v_j (4)

c_ija weight parameter representing the weight of a capsule i in a lower capsule layer (containing m capsules) to a capsule j in the next capsule layer is calculated by a softmax function; input s of capsule j_jFrom c_ijAnd u ^ a_j|iIs obtained by weighted summation ofA non-linear activation function square to ensure that the vector length is between [0,1) for representing the probability of presence; v. of_jRepresents an activation vector for a capsule j located in a higher capsule layer; b_ijIndicating the degree of match between capsule i and capsule j, j ═ 1,2,3, …, m. On the first iteration, b_ijIs initialized to 0, so c is calculated_ijAre equal probability weights. Subsequently, the values of all parameters are iteratively updated by the above formula. After 3 iterations, the weight distribution values tend to converge.

For the last layer of capsules k of the capsule network, the edge loss L is calculated by the following formula_k：

L_k＝T_kmax(0,m⁺-||v_k||)²+λ(1-T_k)max(0,||v_k||-m^-)² (6)

In the formula (6), v_kAn activation vector representing capsule k; t is_kMatching parameters representing class k classification targets corresponding to capsule k, T if and only if the classification is correct_k1, otherwise T_k＝0，v_kRepresents the activation vector, m, of capsule k⁺、m^-λ is capsule vector mode length control parameter, m⁺＝0.9，m^-λ is 0.1 and λ is 0.5. λ is set to 0.5, so that it is possible to prevent the lengths of all capsules from being reduced by erroneous classification results.

In order to realize high-precision image classification based on a capsule network and enable the classification network to have good expressive force under a complex background data set, the present embodiment provides an image classification system based on a Residual capsule neural network, which is loaded with the Residual capsule neural network, and since the Residual capsule neural network is realized based on a memristor on a hardware design described below, the Residual capsule neural network is also called a Memristive Residual capsule network (mrcapson), and a model structure of the system is shown in fig. 1 and includes a first Residual capsule module (Residual capsule module 1), a second Residual capsule module (Residual capsule module 2), a third Residual capsule module (Residual capsule module 3), a digital capsule layer (digital capsule layer), and a reconstruction module (reconstruction network); the first residual capsule module includes a first residual convolution sub-network and a first initial capsule layer, the second residual capsule module includes a second residual convolution sub-network and a second initial capsule layer, and the third residual capsule module includes a third residual convolution sub-network and a third initial capsule layer.

The first residual convolution sub-network performs convolution calculation on the input original image and outputs a first group of feature maps to the second residual convolution sub-network and the first initial capsule layer, the first initial capsule layer converts the first group of feature maps into a first initial capsule group, and the digital capsule layer converts the first initial capsule group into a first digital capsule group.

The second residual error convolution sub-network performs convolution calculation on the input first group of feature maps and outputs the second group of feature maps to a third residual error convolution sub-network and a second initial capsule layer, the second initial capsule layer converts the second group of feature maps into a second initial capsule group, and the digital capsule layer converts the second initial capsule group into a second digital capsule group.

And the third residual convolution sub-network performs convolution calculation on the input second group of feature maps and outputs a third group of feature maps to a third initial capsule layer, the third initial capsule layer converts the third group of feature maps into a third initial capsule group, and the digital capsule layer converts the third initial capsule group into a third digital capsule group.

The digital capsule layer splices the first initial capsule group, the second initial capsule group and the third initial capsule group to obtain a fourth initial capsule group and converts the fourth initial capsule group into a fourth digital capsule group.

The digital capsule layer splices and fuses four groups of digital capsules in the digital capsule layer and then outputs M new digital capsules for target classification and image reconstruction, wherein M is equal to the total classification number of the image classification.

And the reconstruction module is used for generating a reconstructed image with the same size as the original image according to the M new digital capsules.

The memristive residual capsule network has a multi-layer multiplexing neural network structure and comprises 3 residual capsule modules ResCapBlock, the output of a first-stage ResCapBlock (a first residual capsule module/residual capsule module 1) is used for creating a first-stage Digit capsule (a first digital capsule group), and meanwhile, the output of the first-stage ResCapBlock (a second residual capsule module/residual capsule module 2) is also used as the input of a second-stage ResCapBlock (a third residual capsule module/residual capsule module 3), and similarly, the second-stage and third-stage ResCapBlock (a third residual capsule module/residual capsule module 3) are also created in the above manner. The three serially connected residual error capsule modules can realize multi-level feature reuse, each level of residual error capsule module can also output Digit capsules in parallel, the Digit capsules contain features with different granularities learned from different levels, and the capsules can be spliced and fused on a Digit capsule layer.

Taking the input image as a picture (32 × 3) in the CIFAR-10 data set as an example, the output feature map size of each residual capsule module in the CIFAR-10 data set is shown in table 1.

TABLE 1

The memristor residual error capsule network takes an image as input, takes a capsule as a unit for data flow in the network, and is more suitable for classification of complex images based on multi-level feature reuse and capsule splicing training network.

Taking the residual error capsule module 1 as an example, the structure of the module is shown in fig. 3, and the module borrows the idea of residual error learning proposed by ResNet (residual error network), the module comprises two convolution layers, one jump connection layer and one initial (Primary) capsule layer, and unlike ResNet, the module does not comprise a pooling layer in order to prevent spatial information loss. In the residual capsule module 1, in order to increase the receptive field of the deep network, the present example introduces a hole convolution, the principle of which is shown in fig. 4 below, the hole convolution enlarges the receptive field by adding a hole, the output of each convolution operation contains a larger range of information under the same parameter and calculation amount, the hole convolution introduces a hyper-parameter called "expansion rate", and the parameter defines the interval number of each point in the convolution kernel. In the residual capsule module 1, an input image (32 × 3) firstly passes through a first residual convolution sub-network (Res Convlocation Sunetwok), 256 Feature maps (Feature-map, 32 × 256) are output after two times of same convolution calculation, and then a Primary capsule layer is input, wherein the layer comprises 8 capsule units, each capsule unit comprises 12 convolution units with convolution kernel size of (5 × 5), the Feature maps (third-order tensor) are converted into Feature capsules (fourth-order tensor) (14 × 12 × 8) after the Feature maps are subjected to convolution, Reshape and Squash operations, and data of the network are transmitted and operated in a capsule form after the Primary capsule layer. The residual capsule modules 2,3 are constructed based on a similar principle as the residual capsule module 1.

The Digit capsule layer is used to identify the classification result, and the test data sets in this example are all 10 classification problems, so the digital capsule layer outputs 10 capsule units. As shown in fig. 2, the memristive residual capsule network includes three residual capsule modules, which respectively output PrimaryCaps1 (a first initial capsule group), PrimaryCaps2 (a second initial capsule group), and PrimaryCaps3 (a third initial capsule group), and each primarycapsule individually forms a corresponding digital capsule (Digit capsule) through a dynamic routing algorithm and an extrusion activation function, and includes a first digital capsule group (DigitCaps1), a second digital capsule group (DigitCaps2), and a third digital capsule group (DigitCaps 3). As each level of residual convolution subnetwork extracts progressively less information, the size of Digit capsules progressively decreases, namely DigitCaps1(10 x 16D, D representing dimensions), DigitCaps2(10 x 12D), DigitCaps3(10 x 10D). In addition, in order to realize information fusion of different granularities extracted by different residual convolution sub-networks, three independent initial capsules are spliced at a capsule level to form a combined initial PrimaryCaps0 (a fourth initial capsule group), and the capsules are also subjected to a dynamic routing algorithm and an extrusion activation function to form a fourth digital capsule group DigitCaps0(10 × 16D). Finally, the four digital capsule groups are spliced at the capsule level in the Digit capsule layer, and then 10 digital capsules (the fourth digital capsule group, 10 × 54D) are output for classification and image reconstruction.

As shown in fig. 5, the reconstruction module/network consists of one fully connected layer and four transposed convolution layers, and unlike the fully connected reconstruction network in the baseline capsule network model, the MRCapsNet reconstruction network focuses more on the spatial relationship between features when reconstructing images. The input of the reconstruction network is the output of the Digit capsule layer, and the output is a reconstructed image with the same size as the original image.

The loss function of MRCapsNet consists of two parts, edge loss and reconstruction loss.

The capsule network uses the length of the vector to represent the probability that the capsule is present in the classification entity, e.g., if and only if there is a class k classification object in the image, ideally the class k capsule has the longest vector modulo in the highest level digital capsule layer. For multi-target classification in an image, the edge loss function uses a separate margin loss L for each target_k：

L_k＝T_kmax(0,m⁺-||v_k||)²+λ(1-T_k)max(0,||v_k||-m^-)² (6)

T if and only if a classification target k is present in the image_k＝1，m⁺＝0.9，m^-＝0.1，λ＝0.5。

This example uses the reconstruction loss L to assist in capsule network training_rRepresenting the mean square error between the pixels of the reconstructed image and the original image.

To balance the ratio of edge loss and reconstruction loss, the reconstruction loss has a weight coefficient of η at the total loss, which in this embodiment is 0.512. Total loss L_totalThe calculation method of (c) is as follows:

in the experiment, the loss is calculated after the multi-layer residual capsule blocks are spliced, and the combination mode strengthens the fusion of the learned characteristics of the deep-layer capsule and the shallow-layer capsule.

In the training and reasoning process of the capsule network, a large number of matrix multiplication operations are required, and the operations consume a large amount of computing time and power consumption. The traditional computer memory unit based on the von Neumann architecture is separated from the computing unit, and a computing bottleneck exists. In 1990, the concept of "neuromorphic computing" was first proposed, the idea being to combine analog circuits with very large scale integrated circuits for simulating biological nervous systems, so as to obtain neuromorphic systems close to the intelligence level of the human brain. Currently, in the research of neuromorphic computing devices, memristors are often used for simulating biological neurons and simulating construction of biological synapses. The memristor is proposed by Leon Chua in 1971, and the nano memristor element is prepared by Hewlett packard laboratories in 2008 for the first time, so that a foundation is laid for the application of the subsequent memristor element. Nanoscale memristive elements support non-volatile storage, consume low power, and are compatible with CMOS systems. In recent years, a few researchers have proposed neuromorphic computing architectures built based on memristors. For example, some researchers design a memristor-CMOS chip, which can realize high-speed multiply-add operation, and experiments such as principal component analysis and sparse coding are realized on the chip, so that a hardware solution is provided for the low-power-consumption deployment of an algorithm at the end side; researchers build a full hardware-based integrated system of a non-von Neumann architecture, the system is composed of a plurality of memristor cross arrays, the energy efficiency is two orders of magnitude higher than that of a graphic processor, and accuracy rate equivalent to a software algorithm is obtained in an image recognition experiment, so that the potential of the memristor neuromorphic computing system in the aspect of neural network end-side deployment application is further proved.

Based on the memristor cross array, the scheme for realizing the capsule neural network by using memristor cross array hardware is provided in the embodiment, any matrix can be stored by mapping the conduction value of the memristor in the memristor cross array, the parallel operation of the matrix is realized by the current and voltage accumulation characteristics of the memristor cross array, the storage and operation processes are finished in the memristor cross array, data does not need to be transmitted between the operation unit and the storage unit, and the storage and operation integration is realized.

The embodiment constructs a hybrid cross array circuit consisting of memristor elements and CMOS elements, equivalently converts convolution operation into matrix-vector multiplication operation, and realizes parallel operation of the matrix-vector multiplication operation in the memristor cross array.

In each residual convolution subnet, in order to avoid size reduction of the output feature map and loss of image edge information, the present example adopts a "same" convolution mode, which is implemented by complementing 0 to the edge of the input image, and when the convolution step is 1, the feature map output through convolution operation has the same size as the input image. When implementing "same" convolution using a memristive crossbar array, it may be implemented by setting 0 to the corresponding input signal.

In a convolutional neural network, a matrix-vector multiplication operation with h inputs and o outputs can be expressed by the following formula:

in the formula (8), x_nRepresenting the nth element, w, of the input vector_mnRepresents the weight, y, of the m-th row and n-th column in the weight matrix_mRepresenting the mth output in the output vector.

In the hardware design corresponding to the convolutional layer, the first 2D memristor crossbar array is used to implement the matrix-vector multiplication with h inputs and o outputs. As shown in fig. 6, the circuit includes h × o first wires crisscrossed but not connected to h rows and o columns, a second wire crisscrossed but not connected to the input ends of the h rows of first wires, and a third wire crisscrossed but not connected to the input ends of the o columns of first wires, and the second wire and the third wire are connected through an amplifying circuit;

a memristor is arranged at the staggered position of every two first leads to be connected with the two leads, and a first auxiliary constant-value resistor (R) is arranged at the staggered position of each group of the first leads and the second leads and the first leads and the third leads_a) The two wires are connected, and a current-voltage conversion circuit is connected in series with the output end of the o-column first wire;

the amplifying circuit comprises a first auxiliary constant value resistor and a first operational amplifier, wherein the inverting input end and the output end of the first operational amplifier are respectively connected with the second conducting wire and the third conducting wire, the non-inverting input end of the first operational amplifier is grounded, and the first auxiliary constant value resistor is connected between the inverting input end and the output end of the operational amplifier in parallel;

the current-voltage conversion circuit comprises a second auxiliary constant value resistor (R)_c) And the second auxiliary constant-value resistor is connected in parallel between the inverting input end and the output end of the second operational amplifier, the output end of the second operational amplifier serves as a final voltage output end, and the non-inverting input end of the second operational amplifier is grounded.

The wires and memristors used by the first 2D memristive crossbar array are all nanoscale.

Of course, the first 2D memristive crossbar array may have other circuit structures besides the circuit structure shown in fig. 6, as long as the same function is achieved.

When the first 2D memristor cross array works, the input vector x in the formula (8)₁～x_hWill convert to the row input voltage V of the first 2D memristive crossbar array_x1～V_xhWeight w_mnMapping as the conductance value of the memristor of the mth row and n columns in the first 2D memristive crossbar array, and input voltage V_xnWith conductance value w of corresponding memristor_mnMultiplying, superposing the output current flowing through each memristor through a lead to obtain column output current, and converting the output current of each column into voltage V through a current-voltage conversion circuit_ymAnd then output. According to kirchhoff's law, the output V of the first 2D memristive crossbar array_ymExpressed as formula (9), weight w_mnRepresented by formula (10):

w_mn＝(G_a-G_mn)·R_c,m＝1,2,…,o；n＝1,2,…,h (10)

wherein R is_aIs a first auxiliary constant value resistor, G_aIs R_aConductance of (A), R_cA second auxiliary constant value resistor, G_mnIs the conductance value of the mth row and n columns of memristors in the first 2D memristive crossbar array.

The conductance value G of the memristor at the corresponding position can be adjusted_mnWrite weight into first 2D memoryIn an anti-crossing array. Conductance G due to memristor_mnConstant positive value cannot represent negative weight, so constant value resistance R is introduced_aThe conductance value thereof is G_a，G_onAnd G_offRespectively representing the conductance values of the lowest resistance state and the highest resistance state of the memristor, let G_a＝(G_on+G_off) The weight range of the memristor in the memristor cross array is [ - (G)_on-G_off)*R_c/2,(G_on-G_off)*R_c/2]。

Taking the first convolutional layer in the first residual convolutional subnetwork as an example, it uses a convolutional kernel with the size of 3 × 3, the number of input channels is 3, the number of output channels is 64, the convolutional kernel is mapped into 1 column vector with the length of 27 in hardware, and h is 27 and o is 64 in the 2D memristive crossbar array in fig. 6. During the deployment of MRCapsNet, the 3 × 64 weighted values are mapped to the conductance values of the memristors in the 27 × 64 memristive crossbar array. In the network inference phase, the test set of the input network is converted into a read voltage input crossbar array, V, within a voltage threshold_y1～V_y64The output voltage of the terminal respectively represents the output value of the corresponding channel of the capsule network. Similarly, the 2D memristive crossbar array described above is equally applicable to hardware deployments of other convolutional layers in the first residual convolutional subnet and the second and third residual convolutional subnets.

Compared with the traditional convolutional neural network, the input and output of the capsule neural network are expanded by one dimension on the basis of the traditional convolutional neural network, so that the weight calculation among all layers in the capsule neural network is correspondingly expanded by one dimension, the matrix-vector calculation in the traditional convolutional network is changed into the matrix-matrix calculation, and a 3D memristor cross array is required to be adopted. The example constructs a 3D memristor-based crossbar array for realizing hardware deployment of the capsule layer.

For any initial capsule layer of the first initial capsule layer, the second initial capsule layer and the third initial capsule layer, the initial capsule layer is provided with R capsule units, each capsule unit comprises X convolution units, the convolution kernel size is U X U, and on the corresponding hardware design, the operation of the initial capsule layer is realized by adopting a first 3D memristor cross array.

The first 3D memristor cross array comprises T layers of second 2D memristor cross arrays with structures similar to those of the first 2D memristor cross array, the T layers of second 2D memristor cross arrays correspond to the input T characteristic diagrams, the specification of memristors of each layer of second 2D memristor cross arrays is (R X) columns and (U) rows, each layer of second 2D memristor cross arrays operates independently, the outputs of different layers of second 2D memristor cross arrays are connected in the same column, the output voltage of each layer of second 2D memristor cross arrays is summed with the outputs of other layers in the vertical column direction, and the T characteristic diagrams are converted into one-dimensional voltage output through the first 3D memristor cross arrays.

After a voltage sequence, the first 3D memristive crossbar array outputs R X one-dimensional voltage signals, representing one X-dimensional vector of R capsule units; after all voltage sequences are passed by using serial input, the initial capsule layer outputs Y X R dimensional vectors, namely Y X R initial capsules, and Y X X R represents the size of the output characteristic diagram.

Taking the first residual capsule module (rescaplock 1) as an example, the initial capsule layer has 8 capsule units, each capsule unit includes 12 convolution units, and the convolution kernel size is U × 5, so that the initial capsule layer can be equivalent to 12 parallel common convolution layers, each convolution layer has 256 channels as input, and 8 channels as output, thereby being capable of deducing that the initial capsule layer contains 12 × 8 convolution kernels in total.

The first 3D memristive crossbar array structure adopted by the first initial capsule layer is shown in fig. 7, each layer structure is similar to the first 2D memristive crossbar array in implementation, a dark gray port is an input/output port and includes an auxiliary resistor and an operational amplifier, light gray bar-shaped elements represent nanoscale wires, elements at the intersection between each wire and other wires are nano memristors, and different layers are connected by a longitudinal bus. The first 3D memristive crossbar array has 256 layers of the first 2D memristive crossbar array in total, each convolution kernel of 5 × 5 is mapped to one column in the crossbar array, 96(8 × 12) columns represent 12 convolution kernels in 8 capsule units, so that the specification of each layer of the first 2D memristive crossbar array is 96 × 25(8 × 12 × 5), each layer of the array operates independently, and the outputs of different layers are connected in the same column. The 256 layers of the first 3D memristive crossbar array correspond to 256 characteristic maps input into an initial capsule, the output voltage of each layer is summed with the outputs of other layers in the vertical column direction, and the 256 characteristic maps are converted into one-dimensional voltage output through the first 3D memristive crossbar array. After one voltage sequence, the first 3D memristive crossbar array outputs 96(8 x 12) one-dimensional voltage signals, representing one 12-dimensional vector of 8 capsule cells. After all voltage sequences are passed using serial input, the initial capsule layer outputs 14 × 8 12 dimensional vectors, i.e., 14 × 8 initial capsules.

The second initial capsule layer and the third initial capsule layer have the same specification, and the hardware deployment scheme is also implemented in the above manner.

For the digital capsule layer, the digital capsule layer has four sublayers for converting the four groups of initial capsules into four corresponding groups of digital capsules; any sub-layer comprises M capsule units, Y X dimensional vectors output by the initial capsule layer and the M capsule units of the sub-layer share Y X R X M weight transformation matrixes, and the weight transformation matrixes are mapped to Y R X independent second 3D memristive cross arrays in the hardware design process for parallel operation;

each second 3D memristor cross array comprises M layers of third 2D memristor cross arrays with structures similar to those of the first 2D memristor cross arrays, the specification of memristors of each layer of the third 2D memristor cross arrays is V rows and W columns, the V rows correspond to the input of a V-order tensor, the W columns correspond to the output of a W-order tensor, and finally the sublayer outputs M W-order tensors, namely M digital capsules with dimensions of W.

Taking the Cifar10 dataset as an example, MRCapsNet needs to solve the 10 classification problem, i.e., M is 10, so the digital capsule layer of the network contains 10 capsule units, corresponding to 10 pictures to be classified. There is a corresponding weight transformation matrix between each capsule in the initial capsule layer and each capsule in the digital capsule layer, taking digitcapsayer 1 as an example for the first sublayer of the digital capsule layer, the output of this layer is 10 tensors of 16 orders, so the 14 × 8 tensors of the output of the initial capsule layer PrimaryCapsLayer1 share 14 × 8 × 12 weight transformation matrices with the 10 capsule units of the digital capsule layer1, these matrices are mapped during the inference process to 14 x 8 x 12 independent second 3D memristive crossbar arrays operating in parallel, each second 3D memristive crossbar array is structured as shown in fig. 8, and the second 3D memristive crossbar array is composed of 10 layers of 12 × 16 third 2D memristive crossbar arrays (similar to the first 2D memristive crossbar array structure), where V ═ 12 rows of inputs indicate that the input of the digital capsule layer is a 12-order tensor, and W ═ 16 columns of outputs indicate that the output of the digital capsule layer is a 16-order tensor. In the digital capsule layers, the 10 layers of the second 3D memristive crossbar array all have the same 12 th order tensor input from the original capsule, and each layer, i.e., each capsule unit, independently outputs a 16 th order tensor. The first sublayer of the final digital capsule layer, DigitCapsLayer1, outputs 10 16-order tensors for object classification and image reconstruction. Similarly, hardware implementations of other sub-layers are also implemented as described above.

The dimensionality of the digital capsule output by the four sub-layers of the digital capsule layer is W₁、W₂、W₃、W₄Then the digital capsule layer splices the four groups of digital capsules to obtain M digital capsules with dimensions W₁+W₂+W₃+W₄The new digital capsule of (3) is used for object classification and image reconstruction. As shown in FIG. 2, W₁、W₂、W₃、W₄Respectively equal to 16, 12, 10, 16, W₁+W₂+W₃+W ₄54, M10, the digital capsule layer outputs 10 new digital capsules of dimension 54.

The 3D memristor cross array improves the packaging density, the structure is more compact, and a dimension parallel operation is added on the basis of the 2D memristor cross array, so that the high-efficiency parallel operation is supported.

The transposed convolution layer up-samples the input tensor, and a reconstruction module formed by the transposed convolution layers with multi-layer superposition restores the feature map to the original size. The first step of the transposition convolution is to complement 0 to the input feature graph according to the rule, and the second step is to execute normal convolution operation to the feature graph after the 0 is complemented. Defining the size of the transposed convolution input characteristic diagram as H N, the convolution step size as S, and the size of each channel of the characteristic diagram after being supplemented with 0 as H_out*H_outThe calculation method is as follows:

H_out＝H+(H-1)×(S-1)+2P+O (11)

the filling rule is as follows: first, insert S-1 0S between every two pixels of the input feature map, then complement P circles of 0 values around the feature map, finally complement O rows of 0 values below the feature map, and complement O columns of 0 values to the right of the feature map. In the process of realizing transposition convolution by hardware, firstly, setting 0 to the corresponding input signal according to the rule, and then mapping the weight to the memristor cross array according to the circuit realization scheme of the convolution network to finish deployment.

The following experiment and analysis were performed.

In the experiment, the memristive residual error capsule network MRCapsNet is tested on three data sets, namely MNIST, Cifar-10 and SVHN, and is compared with other capsule models. In addition, the present example also performs an ablation experiment to prove the effectiveness of the multi-stage residual error capsule module proposed by the present scheme. The test is carried out on software, the test environments are an Ubuntu operating system, Keras2.1.6, Tensorflow gpu1.15.4 and Python3.6, and the hardware environment for model training is a GeForce RTX3090 and an Intel Xeon E3 processor.

In order to fully test the performance of MRCapsNet on simple and complex data sets, the following three data sets were selected for testing.

1. MNIS: the data set consisted of 70K 10-class grayscale handwritten digital images, 60K for training, 10K for testing, and 28 x 28 image size.

2. CIFAR-10: the data set contained 60K color images of size 32 x3 for a total of 10 classes, with 50K images for training and 10K images for testing. The images in the CIFAR-10 data set have complex characteristic information and background information, and the performance of the network under the condition of strong interference can be well detected.

3. Street View House Numbers (SVHN): the data set contains 10 color images of 32 x3 in the category, 73K training sets and 26K testing sets, and the images are from house numbers in google streetscape and can detect the performance of the neural network in a real natural scene.

In the training process, the Adam optimizer is used in the example, the initial learning rate is 0.001, the attenuation after each epoch is 0.9 times of the original attenuation, the batch size is 128, and the number of training iterations is 100. During the training of the MRCapsNet, the above parameters remain unchanged in all data sets.

In the experiment, the accuracy of MRCapsNet and the accuracy of Capsule network CapsNet are compared, and the results are shown in Table 2. As can be seen from table 2, compared with the capsule network CapsNet shown in fig. 1, the performance of the MRCapsNet single model on the CIFAR-10 data set and the SVHN data set is improved by 21.6% and 0.9%, respectively, and for the MNIST data set, the performance of the MRCapsNet single model is slightly lower than that of the CapsNet, but the MRCapsNet single model is maintained at a higher level, the parameter amount of the MRCapsNet is 16.01M, and the parameter amount of the CapsNet is 22.48M, which indicates that the multi-level residual module multiplexing structure provided in this example can effectively learn the multi-level features in the image.

TABLE 2

Model (model)	MNIST	SVHN	CIFAR-10
				CapsNet	99.75％	95.70％	68.74％
MRCapsNet	99.70％	96.60％	90.34％

In the following, the present example demonstrates the superiority of the multi-stage ResBlock, hole convolution, and transposed convolution based reconstruction network through ablation experiments. In the CNN field, research results on a residual error network indicate that a multi-stage residual error block can improve the performance of the network to a certain extent, in order to understand how the number of residual error capsule modules affects the performance of MRCaps, this embodiment has performed a series of experiments, and as shown in table 3, when the number of residual error capsule modules is increased from 1 to 3, the accuracy of the network is increased from 80.58% to 89.65%, and when a hole convolution method is added to the residual error capsule modules and three capsule modules are spliced, the accuracy is increased to 90.34%. As can be seen from the training Loss graph in fig. 9, the Loss value decreases to some extent after each level of ResCapsBlock, which indicates that each level of ResCapsBlock learns new features.

In the comparison experiment of the number of the capsule modules, the comparison experiment of the cavity convolution is added in the embodiment, and the experiment result table 3 shows that after the cavity convolution is used, the precision of each level of residual capsule module is improved, the overall precision of the network is improved from 89.95% to 90.34%, and the improvement effect is obvious. The shallow residual error capsule module extracts more small characteristic information by using a convolution kernel of 3 multiplied by 3, and the application of a hole convolution method in the deep residual error capsule module can increase the receptive field and extract characteristic information in a larger range on the premise of not increasing the calculated amount.

TABLE 3

Model (model)	Test accuracy (%)
		ResCapsBlock1 (without hole volume)Product)	80.58
ResCapsBlock 2 (without hole convolution)	88.32
		ResCapsBlock 3 (without hole convolution)	89.65
Merging ResCapsBlock1, 2,3 (no hole convolution adopted)	89.95
		ResCapsBlock1 (convolution with a hole having an expansion ratio of (1, 1))	80.59
ResCapsBlock 2 (convolution with a hole having an expansion ratio of (1, 1))	88.71
		ResCapsBlock 3 (convolution with a hole with an expansion ratio of (2, 2))	89.8
Merging ResCapsBlock1, 2,3 (using hole convolution)	90.34

In the embodiment, a reconstruction experiment is performed on a CIFAR-10 data set by using a reconstruction network proposed by MRCamet and a reconstruction network of a CameNet baseline network, and as can be obtained from FIG. 10, the reconstruction network formed by multiple layers of transposed convolutions proposed in the embodiment has a smaller reconstruction error in a training phase, and as can be seen from Table 4, the number of parameters of the reconstruction sub-network formed by the transposed convolutions is only 16.39% of that of the CameNet baseline network, and the reconstruction sub-network has better performance.

TABLE 4

Reconstruction module	Amount of ginseng	ACC
			Transposed convolution (MRCapsNet)	303497	90.34％
Full connection (CapsNet)	1851904	89.77％

The power consumption of the core computing unit of the MRCapsNet mainly comprises two parts, wherein one part is the power consumption required for mapping the neural network weight to the cross array, and the other part is the power consumption of the memristor cross array in the reasoning process. Maximum resistance R of memristor_onAnd a minimum resistance value R_off100k omega and 100M omega respectively. The memristor write voltage is 6.5V, and the maximum write power consumption of a single memristor is 0.5 muW. In the inference stage of the memristor cross array, the power consumption is different because the weight written by each memristor is different, so that the maximum power consumption of the memristor under the limit condition is estimated to be 6.9 muW according to the above formula in the present example, and the power consumption of the CMOS circuit which realizes the same calculation is close to 60 muW. Therefore, the hardware implementation proposed in this example can greatly reduce the capsule network end-side deployment power consumption.

In summary, in the aspect of algorithms, a new capsule network structure MRCapsNet is provided in the embodiments of the present invention, and a rescaplock module is constructed based on the idea of residual error learning, so that the problem of gradient disappearance after a network deepens is solved, multi-level information multiplexing is realized by a multi-level rescaplock series connection manner, and a hole convolution is introduced into a higher layer of rescaplock, so that a receptive field is expanded, which is helpful for a capsule network to extract more features. The embodiment of the invention also provides a reconstruction network constructed by using the transpose convolution, which improves the network performance while reducing the network parameter. Compared with the capsule network model at the present stage, MRCaspNet achieves more advanced performance.

The embodiment of the invention also provides a capsule network hardware implementation scheme based on the memristor cross array, and the memristor cross array has the advantages of low power consumption and support for parallel matrix calculation, so that the problem of large calculation amount of the capsule network can be solved. The power consumption of the neurons in the circuit is analyzed and calculated, and the result shows that the power consumption of the circuit based on the memristor cross array is far smaller than that of a CMOS circuit.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. An image classification system based on a residual capsule neural network is characterized in that the residual capsule neural network is loaded; the residual capsule neural network comprises a first residual capsule module, a second residual capsule module, a third residual capsule module and a digital capsule layer; the first residual capsule module comprises a first residual convolution sub-network and a first initial capsule layer, the second residual capsule module comprises a second residual convolution sub-network and a second initial capsule layer, and the third residual capsule module comprises a third residual convolution sub-network and a third initial capsule layer;

2. The residual capsule neural network-based image classification system according to claim 1, wherein: the residual capsule neural network further comprises a reconstruction module for generating a reconstructed image with the same size as the original image according to the M new digital capsules.

3. The system according to claim 2, wherein the loss function of the residual capsule neural network is:

wherein L is_totalRepresents the total loss; l is_k＝T_kmax(0,m⁺-||v_k||)²+λ(1-T_k)max(0,||v_k||-m^-)²Represents the edge loss, T, of capsule k in the last capsule layer_kMatching parameters representing class k classification targets corresponding to capsule k, T if and only if the classification is correct_k1, otherwise T_k＝0，v_kRepresents the activation vector, m, of capsule k⁺、m^-λ is capsule vector mode length control parameter, m⁺＝0.9，m^-＝0.1，λ＝0.5；L_rRepresenting a reconstruction loss equal to the mean square error between the pixels of the reconstructed image and the original image; η represents the weight coefficient of the reconstruction loss at the total loss.

4. The residual capsule neural network-based image classification system according to claim 3, wherein: the first residual convolution sub-network, the second residual convolution sub-network and the third residual convolution sub-network respectively comprise two convolution layers and a jump connection layer, and each convolution layer adopts same convolution introduced with cavity convolution; the digital capsule layer converts the initial capsule into the digital capsule through a dynamic routing algorithm and an extrusion activation function.

5. The residual capsule neural network-based image classification system according to claim 4, wherein: in a convolutional layer of the first, second, or third residual convolutional subnets, for a matrix-vector multiplication operation with h inputs and o outputs, the following formula is expressed:

in the formula (8), x_nRepresenting the nth element, w, of the input vector_mnRepresenting moment of weightWeight, y, of the m-th row and n-th column in the array_mRepresenting the mth output in the output vector;

6. The residual capsule neural network-based image classification system according to claim 5, wherein: for any initial capsule layer of the first initial capsule layer, the second initial capsule layer and the third initial capsule layer, the initial capsule layer is provided with R capsule units, each capsule unit comprises X convolution units, the convolution kernel size is U × U, and the operation of the initial capsule layer is realized by adopting a first 3D memristor cross array on the corresponding hardware design;

7. The residual capsule neural network-based image classification system according to claim 6, wherein: the digital capsule layer is provided with four sublayers and is used for converting the four groups of initial capsules into four corresponding groups of digital capsules; any sub-layer comprises M capsule units, Y X dimensional vectors output by the initial capsule layer and the M capsule units of the sub-layer share Y X R X M weight transformation matrixes, and the weight transformation matrixes are mapped to Y R X independent second 3D memristive cross arrays in the hardware design process for parallel operation;

8. The residual capsule neural network-based image classification system according to claim 7, wherein: the dimensionality of the digital capsule output by the four sub-layers of the digital capsule layer is W₁、W₂、W₃、W₄Then, the digital capsule layer splices the four groups of digital capsules to obtain M digital capsules with dimensions W₁+W₂+W₃+W₄The new digital capsule of (3) is used for object classification and image reconstruction.

9. The residual capsule neural network-based image classification system according to claim 3, wherein: the reconstruction module comprises a full-connection layer and four transposed convolution layers, wherein each transposed convolution layer firstly complements 0 to the input feature graph according to a rule and then performs convolution operation on the feature graph after 0 complementation;

H_out＝H+(H-1)×(S-1)+2P+O (11)

10. The image classification system based on the residual capsule neural network as claimed in any one of claims 1 to 9, wherein: the data set used to train and test the residual capsule neural network includes more than ten thousand class M-10 grayscale images or RGB color images derived from MNIST, Cifar-10 or SVHN data sets.