CN110263855A

CN110263855A - A method of it is projected using cobasis capsule and carries out image classification

Info

Publication number: CN110263855A
Application number: CN201910538745.6A
Authority: CN
Inventors: 邹文斌; 彭文韬; 向灿群; 徐晨
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2019-06-20
Filing date: 2019-06-20
Publication date: 2019-09-20
Anticipated expiration: 2039-06-20
Also published as: CN110263855B

Abstract

The invention belongs to Image Classfication Technology fields, disclose a kind of method for projecting using cobasis capsule and carrying out image classification, include the following steps: that (1) extracts the feature of input picture using multilayer convolutional network, obtain characteristic pattern；(2) characteristic pattern is mapped to an one-dimensional feature vector, X；(3) eigentransformation is carried out to feature vector, X, feature vector, X is divided into N group, and mix vector is characterized matrix；(4) eigenmatrix is subjected to the projection of cobasis capsule, projects to multiple capsule subspaces, vector mould after calculating each subspace projection it is long and, image classification prediction is carried out according to the size of the long sum of mould.The present invention projects thought for its Projection Character to multiple capsule subspaces using cobasis capsule, then the prediction of image classification task is carried out again, it finds that the network can adapt to big small-scale image by experiment, and extraordinary classifying quality can be reached using the training of lesser data set.

Description

A method of it is projected using cobasis capsule and carries out image classification

Technical field

The invention belongs to Image Classfication Technology fields, carry out image more particularly, to a kind of project using cobasis capsule The method of classification.

Background technique

In recent years, extensive utilization is to every field for the convolutional neural networks in deep learning, such as computer vision, certainly The fields such as right Language Processing, big data analysis, related ends are also considerably beyond the anticipation of people.Especially led in computer vision Domain, convolutional neural networks (Convolutional Neural Networks, CNN) are because it is in target identification, target classification etc. times Outstanding representation in business, the favor by many researchers and worker.

But research in discovery convolutional neural networks have the defects that an essence, when image data set very close to When image, the impact of performance of convolutional neural networks can be very good, but if image has overturning, inclination or the directions such as any other When property problem, the performance of convolutional neural networks is just worse.This is because convolutional neural networks can not consider bottom pair Spatial relationship as between, in convolutional neural networks, it is a scalar that upper one layer of neuron, which passes in next layer of neuron, Scalar only has size without direction, so the position orientation relation between high-level characteristic and low-level feature cannot be represented.Meanwhile it rolling up Although to ensure that feature indeformable in translation and rotation for pond layer in product neural network, but is also lost largely has simultaneously The information of value reduces the resolution ratio in space, this results in the minor change for input, and output is almost constant, Therefore there are biggish limitations for convolutional neural networks.

For this limitation, end of the year Hinton in 2017 has delivered paper " Dynamic routing between Capsules ", propose more deep algorithm and the capsule network architecture.Capsule network uses neural capsule unit, so that on It is a vector that one layer of neural capsule, which is output in next layer of neural capsule, and vector not only has size, can be with there are also direction attribute The direction of feature is represented, to set up the corresponding relationship between spatially feature, this greatly compensates for convolutional Neural net Deficiency existing for network.Compared to the weak spatial correlation of CNN feature, the vector quantization feature of capsule network is then considered to well Space correlation between expression characteristic.

Summary of the invention

In view of the drawbacks of the prior art, the purpose of the present invention is to provide a kind of projected using cobasis capsule to carry out image point The method of class, it is intended to which the convolutional neural networks for solving to use in the prior art, which are lost a large amount of valuable information, leads to classification not Accurate problem.

The present invention provides a kind of methods for being projected using cobasis capsule and carrying out image classification, include the following steps:

(1) feature that input picture is extracted using multilayer convolutional network, obtains characteristic pattern；

(2) characteristic pattern is mapped to an one-dimensional feature vector, X；

(3) eigentransformation is carried out to described eigenvector X, feature vector, X is divided into N group, and mix vector is characterized square Battle array [x₁, x₂... ... x_n]；

(4) eigenmatrix is subjected to the projection of cobasis capsule, projects to multiple capsule subspaces, calculates every sub-spaces Vector mould after projection is grown and carries out image classification prediction according to the size of the long sum of mould.

Due to doing image classification task at present, most of network all uses convolutional neural networks to extract characteristics of image, so Classification prediction is done by full articulamentum afterwards.But convolution, which comes out, is characterized in scalar form, and scalar only has size without direction, That is feature lacks spatial information.And capsule in the application projects network, classifies in vector form, by by capsule Projection network processes are characterized in vector, not only there is size, there are also direction, can shelf space information to a certain degree, therefore It is more advantageous to classification, the precision of classification can be improved.

Wherein, when needing to do the prediction of L classification, the quantity of capsule subspace is L.

Further, to eigenmatrix [_1,2,…,_d] in every group of vector projected using same group of base.

The present invention is big for capsule network parameter amount in the prior art, and training predetermined speed is slow, it is difficult to be generalized to deep layer net It the defects of network, proposes that " cobasis " capsule projects thought, projects thought using cobasis capsule, by its Projection Character to multiple capsule Then space carries out the prediction of classification task again, therefore the not interference vulnerable to multi-class overlapped object, can effectively handle and deposit In the crowd scene of overlapping object；The accuracy of classification task can be improved.

Further, in step (3), the vector dimension of feature vector, X is d, every group of vector dimension in eigenmatrix For d/N,

Further, in step (4), by learning one group of projection basic matrix W_l∈R^d/N×c, will be special using basic matrix Levy each capsule subspace vector { v of the vector projection into the corresponding capsule subspace S of each class, after being projected₁,v₂,… v_L, dimension c.

Wherein, capsule subspace search model are as follows:

Formula indicates to be based on subspace span (W_l) in find an optimal projection vector v_l, so that v_lWith projection vector x Error it is minimum.

Wherein, in order to find one group of suitable base W_l, using following constraint:

v_l=P_lx,P_l=W_lW_l ⁺……(2)

Wherein, P_lFor capsule subspace S_l(S_l=span (W_l)) projection matrix, W_l ⁺It is W_lGeneralized inverse matrix, work as W_lColumn When spatial linear is unrelated, there is W_l ⁺=(W_l ^TW_l)^-1W_l ^T。

Wherein, capsule v after projection_lLength is calculated by the following formula: Wherein, Σ_l=(W_l ^TW_l)^-1, can be considered weight regular terms.

Wherein, as the length ‖ v for obtaining the projection vector in subspace_l‖₂Afterwards, each class is found using entropy loss is intersected Other optimal subspace:Wherein, v_yIt is input vector x in correct class small pin for the case Space S_yIn projection vector.

Wherein, the gradient of base is calculated using following formula in subspace:

Wherein, x^⊥=x-V=X-P_lX=(I-P_l) x,The update of the base of subspace is projected The guidance of vector quadrature component in subspace, as quadrature component x^⊥When being 0, the gradient of base is 0, base W at this time_lIt is optimal, energy It is enough to retain all information for being originally inputted x.

The parameter amount of capsule network is big in the prior art, processing speed is slow, it is difficult to be generalized to very deep structure, performance It is not good enough that large-scale image is showed.If the present invention avoids that the dry passage in characteristic layer is directly aggregated into several capsule (mesh Preceding capsule network way).But cobasis capsule is utilized to project thought, by its Projection Character to multiple capsule subspaces, then The prediction for carrying out classification task again finds that the network can adapt to big small-scale image by experiment, and even if adopts Extraordinary effect can also be reached with the training of lesser data set.And it is grouped using feature vector and then carries out cobasis projection again Method, also reduce the complexity of network, reduce the parameter amount of network, speeded the speed of network training and prediction.

Detailed description of the invention

Fig. 1 is a kind of implementation flow chart of method that progress image classification is projected using cobasis capsule provided by the invention；

Fig. 2 is that the realization provided in an embodiment of the present invention for projecting the method for carrying out image classification using cobasis capsule is illustrated Figure；

Fig. 3 is the perspective view of a capsule provided in an embodiment of the present invention；

Fig. 4 is quadrature component guidance gradient updating schematic diagram provided in an embodiment of the present invention.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.

The present invention is directed to the deficiency of existing capsule network, and parameter amount is big, processing speed is slow, it is difficult to be generalized to very deep knot The defect of structure proposes a kind of capsule cobasis projection (Capsule Common-base Projection Network) network.It should Network can make the detailed attribute information (position, rotation, size etc.) of input object be retained in a network, therefore i.e. Make still can correctly identify the same target that translation occurs, rotates, scale.Moreover, because capsule projects network Vector quantization feature be in strong correlation, contain the spatial correlation informations such as posture, the deformation of extracted feature, therefore not vulnerable to more The interference of classification overlapped object can effectively handle the crowd scene there are overlapping object.

Can also be by the network promotion to text categorization task, in the task that multi-tag is classified, the property of capsule network It can far away be more than convolutional neural networks (CNN) and shot and long term memory network (LSTM)；Alipay finds capsule network application Network (such as LSTM, Bi-LSTM and CNN-rand etc.) before being better than to the overall performance complained on text model.

In addition, network uses the thought of cobasis, feature vector is divided into several groups, and it is multiple to use same group of base to project to Subspace, therefore do not need to learn how effectively to identify target object in all cases by huge training data.Only Using only the training of less data amount, good generalization ability can be obtained.

In terms of optical rehabilitation, though based on cobasis capsule projection network under the more circumstance of occlusion of scene, also can be accurate Reconstruct object.

Capsule network is in actual scene at present but its development is still in the primary stage, but impayable based on itself Feature, the following capsule network will have broader practice prospect in fields such as computer vision, natural language processings.

For image classification task, current deep learning method is to extract feature using convolutional layer, and convolutional layer is generated Characteristic pattern (feature map) be mapped to the feature vector of a regular length, then connect several full articulamentums and carry out Classification.For example the ImageNet model of AlexNet exports one 1000 vector tieed up expression input picture and belongs to the general of every one kind Rate (softmax normalization).However the feature that convolutional neural networks extract lacks relevance spatially, the present invention is by image volume The feature that product comes out is without fully-connected network, if while avoiding that the dry passage in characteristic layer is directly aggregated into several capsules (way that current capsule network is taken) but utilize cobasis capsule project thought, but by its feature be divided into several groups to Then amount carries out the projection of cobasis capsule, so that then Projection Character carries out the prediction of classification task to multiple capsule subspaces again. It was proved that the network can further increase the accuracy of classification task.

Meanwhile the classification accuracy of capsule cobasis projection network of the invention can exceed that other mainstream network structures, this Also a new approaches are indicated to improve the performance of depth network.

Fig. 1 and Fig. 2 is respectively illustrated a kind of projected using cobasis capsule provided in an embodiment of the present invention and carries out image classification Method implementation process, for ease of description, only parts related to embodiments of the present invention are shown, now in conjunction with attached drawing be described in detail It is as follows:

The method provided in an embodiment of the present invention for projecting progress image classification using cobasis capsule includes the following steps:

(1) feature that input picture is extracted using multilayer convolutional network, is obtained characteristic pattern (feature map)；

Wherein feature is exactly the characteristic pattern extracted by convolutional layer+pond layer of convolutional neural networks.In the present invention In embodiment, convolutional neural networks base frame has Vgg, GoogleNet, ResNet, DenseNet etc., the net specifically used Network frame can according to need selection.

Image is a four-dimensional tensor (B, C, W, H) by the characteristic pattern that convolutional neural networks are extracted, wherein B It is the batch size of sample, C is channel.W is the width of image, and H is the height of image.Characteristic pattern possesses the minutia letter of image Breath, these information help to do the prediction of classification task.

Feature to be extracted using CNN, there is congenital superiority, it extracts image semantic feature abundant with convolutional layer, Then network parameter is reduced with pond layer, finally explains feature with full articulamentum.

In embodiments of the present invention, it can also adopt and extract characteristic pattern with other methods, such as traditional machine learning method (decision tree classification, random forest classification, k nearest neighbor classifier, multilayer perceptron MLP etc.), there are also RNN (Recognition with Recurrent Neural Network), But the method that deep learning does image classification often uses CNN.

(2) characteristic pattern (feature map) that convolutional layer generates is mapped to the feature vector, X of a regular length；

Image is a four-dimensional tensor (B, C, W, H) by the characteristic pattern that convolutional neural networks come out, and wherein B is sample Batch size, C is channel.W is the width of image, and H is the height of image.Usually doing classification task can be this four-dimensional tensor, first Expansion is elongated to an one-dimensional vector, then does classification prediction by fully-connected network.

(3) eigentransformation is done to feature vector, X, feature vector, X is divided into N group, then mix vector is characterized matrix [x₁, x₂... ... x_n]；

(4) eigenmatrix is subjected to the projection of cobasis capsule, projects to multiple capsule subspaces, calculates each subspace projection Vector mould afterwards is grown and carries out image classification prediction according to the size of the long sum of mould.

Eigenmatrix is carried out rectangular projection to multiple capsule subspaces (if doing the prediction of N number of classification, subspace Quantity is just N).There is no the losses of information for the process of the projection, and capsule subspace can include more new characteristic informations, To be more effectively trained to network structure.During projection, to [x in eigenmatrix₁, x₂... ... x_n] every group Vector is projected using same group of base, can reduce parameter in this way, to reduce the complexity of network, accelerate network training and Convergent speed.

For image classification task in the present invention, the accuracy of prediction can not only be increased using cobasis capsule projection network, It also can be reduced parameter amount, and then accelerate the speed of identification.

In embodiments of the present invention, eigenmatrix is subjected to rectangular projection to multiple capsule subspaces (if doing L classification Prediction, then subspace quantity is just L).Only have very small a part of information loss, and capsule during the projection It subspace can be comprising more new characteristic informations, to be more effectively trained to network structure.It is right during projection In eigenmatrix [x₁, x₂... ... x_n] in every group of vector all projected using same group of base, parameter (body can be reduced in this way The parameter of projection basic matrix is less now), to reduce the complexity of network, and accelerate network training and convergent speed.By The detailed spatial information of image is remained in capsule network, therefore each in positioning, object detection, semantic segmentation or example segmentation etc. The prospect that kind computer vision field has it to apply.

Here " base " refers to " base vector ", can find one group of base vector in any space to express in this space Institute's directed quantity.Optimized in the present invention by network, reduce this loss, so that final projection result keeps original letter as far as possible Breath.

Fig. 3 shows the perspective view of a capsule provided in an embodiment of the present invention, and N is 4 in figure, and expression will be special Sign vector is divided into 4 groups, then carries out the projection of cobasis capsule.It will be made below specifically introducing.

In embodiments of the present invention, specific projection process is as follows:

X is the feature vector after characteristic pattern transformation, and feature vector is divided into N group, constitutive characteristic square by vector dimension d Battle array { x₁,x₂,…x_n, every group of vector dimension in matrix is d/N, and d value is a parameter, and typically greater than 1 integer thinks handle Feature is divided into how many group can be with sets itself.

In order to learn to each class another characteristic, final network will learn to one group of capsule subspace { S₁,S₂,…S_L}；Its In, L is final predefined categorical measure.By learning one group of projection basic matrix W_l∈R^d/N×c, using basic matrix by feature to Amount projects in the corresponding capsule subspace S of each class, each capsule subspace vector { v after finally obtaining projection₁,v₂,… v_L, dimension c.For difference in learning feature, the orthogonal basis of capsule subspace is enabled to maximize reservation by constrained optimization Primitive character information projects subspace vector v_lLength indicate the category occur probability, direction indicate the category attribute. Capsule subspace search model is as follows:

Formula indicates to be based on subspace span (W_l) in find an optimal projection vector v_l, so that v_lWith projection vector x Error it is minimum, in other words, the vector projected in subspace should save the information being originally inputted as far as possible.In order to find a combination Suitable base W_lMeet above formula, we do following constraint:

v_l=P_lx,P_l=W_lW_l ⁺……(2)

P in formula_lFor capsule subspace S_l(S_l=span (W_l)) projection matrix, W_l ⁺It is W_lGeneralized inverse matrix.Work as W_lColumn When spatial linear is unrelated, there is W_l ⁺=(W_l ^TW_l)^-1W_l ^T.Therefore, capsule v after projection_lLength can be directly calculate by the following formula:

Σ in formula_l=(W_l ^TW_l)^-1, can be considered weight regular terms.Obtain the length ‖ v of the projection vector in subspace_l‖₂Afterwards, The optimal subspace of each classification is found using entropy loss is intersected:

In formula, v_yIt is input vector x in correct classification subspace S_yIn projection vector.The gradient of base calculates in subspace It is as follows:

As shown in figure 4, x^⊥=x-V=x-P_lX=(I-P_l) x, thereforeMean subspace Base update by projection vector quadrature component in subspace guidance, as quadrature component x^⊥When being 0, the gradient of base is 0, this When base W_lIt is optimal, all information for being originally inputted x can be retained.

Fig. 4 shows quadrature component guidance gradient updating schematic diagram provided in an embodiment of the present invention；Finding, capsule is empty Between middle optimal base when, the update of base vector is guided by quadrature component, and when quadrature component tends to 0, network acquires optimal base.For every Sub-spaces capsule, after obtaining optimal base, the vector mould after calculating optimal base projection it is long and, digital representation final classification it is general Rate.

If the present invention avoids that the dry passage in characteristic layer is directly aggregated into several capsules (current capsule network way). But cobasis capsule projection thought is utilized then to carry out the pre- of classification task again by its Projection Character to multiple capsule subspaces It surveys, finds that the network can adapt to big small-scale image by experiment, and even if using the training of lesser data set Also it can reach extraordinary effect.And the method for then carrying out cobasis projection again is grouped using feature vector, also reduce net The complexity of network reduces the parameter amount of network, has speeded the speed of network training and prediction.

Table 1: part of test results is shown

Table 1 is the Experimental results show tested on CIFAR10 and CIFAR100 data set, through experimental analysis, the present invention Capsule cobasis projection network not only improve classification task prediction precision, also reduce the parameter amount of network, improve net The speed of network training and prediction.

In conclusion current deep learning method is to extract feature using convolutional layer for image classification task, will roll up The characteristic pattern (feature map) that lamination generates is mapped to the feature vector of a regular length, then connects several and connects entirely Layer is connect to classify.For example the ImageNet model of AlexNet exports one 1000 vector tieed up expression input picture and belongs to often A kind of probability (softmax normalization).However the feature that convolutional neural networks extract lacks relevance spatially, the present invention The feature that image convolution is come out is without fully-connected network, if while avoiding directly aggregating into the dry passage in characteristic layer several A capsule (way that current capsule network is taken) but utilize cobasis capsule project thought, its feature is divided into several groups Then vector carries out the projection of cobasis capsule, so that then Projection Character carries out the pre- of classification task to multiple capsule subspaces again It surveys.It was proved that the network can further increase the accuracy of classification task.

As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, not to The limitation present invention, any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should all include Within protection scope of the present invention.

Claims

1. a kind of project the method for carrying out image classification using cobasis capsule, which is characterized in that include the following steps:

(2) characteristic pattern is mapped to an one-dimensional feature vector, X；

(3) eigentransformation is carried out to described eigenvector X, feature vector, X is divided into N group, and mix vector is characterized matrix [x₁, x₂... x_n]；

2. the method as described in claim 1, which is characterized in that the number of capsule subspace when needing to do the prediction of L classification Amount is L.

3. method according to claim 1 or 2, which is characterized in that eigenmatrix [x₁, x₂..., x_d] in every group of vector It is projected using same group of base.

4. the method according to claim 1, which is characterized in that in step (3), the vector of feature vector, X is tieed up Degree is d, and every group of vector dimension is d/N in eigenmatrix.

5. method according to any of claims 1-4, which is characterized in that in step (4), by learning one group of projection base Matrix W_l∈R^d/N×c, using basic matrix by eigenvector projection into the corresponding capsule subspace S of each class, after being projected Each capsule subspace vector { v₁, v₂... v_L, dimension c.

6. method as claimed in claim 5, which is characterized in that in step (4), capsule subspace search model are as follows:

Formula indicates to be based on subspace span (W_l) in find an optimal projection vector v_l, so that v_lWith the mistake of projection vector x It is poor minimum.

7. such as method described in claim 5 or 6, which is characterized in that in step (4), in order to find one group of suitable base W_l, Using following constraint:

v_l=P_lX, P_l=W_lW_l ⁺......(2)

Wherein, P_lFor capsule subspace s_l(s_l=span (W_l)) projection matrix, W_l ⁺It is W_lGeneralized inverse matrix, work as W_lColumn space When linear independence, there is W_l ⁺=(W_l ^TW_l)^-1W_l ^T。

8. such as the described in any item methods of claim 5-7, which is characterized in that in step (4), capsule v after projection_lLength is logical Following formula is crossed to be calculated:

Wherein, ∑_l=(W_l ^TW_l)^-1, can be considered weight regular terms.

9. such as the described in any item methods of claim 5-8, which is characterized in that in step (4), thrown in subspace when obtaining The length of shadow vector | | v_l||₂Afterwards, the optimal subspace of each classification is found using entropy loss is intersected:

Wherein, v_yIt is input vector x in correct classification subspace S_yIn projection vector.

10. such as the described in any item methods of claim 5-9, which is characterized in that in step (4), the gradient of base in subspace It is calculated using following formula:

Wherein, x^⊥=x-V=x-P_lX=(I-P_l) x,The update of the base of subspace is by projection vector The guidance of quadrature component in subspace, as quadrature component x^⊥When being 0, the gradient of base is 0, base W at this time_lIt is optimal, Neng Goubao Stay all information for being originally inputted x.