CN109255381A

CN109255381A - A kind of image classification method based on the sparse adaptive depth network of second order VLAD

Info

Publication number: CN109255381A
Application number: CN201811038736.2A
Authority: CN
Inventors: 王倩倩; 陈博恒; 刘娇蛟; 马碧云
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2018-09-06
Filing date: 2018-09-06
Publication date: 2019-01-22
Anticipated expiration: 2038-09-06
Also published as: CN109255381B

Abstract

The present invention proposes a kind of image classification method based on the sparse adaptive depth network of second order VLAD, belongs to image classification and depth learning technology field.The present invention extracts convolution feature from multiple convolutional layers first, and corresponding SASO-VLAD coding is then obtained in each convolution feature, finally summarizes all SASO-VLAD codings, constructs final multipath feature coding network.This method uses the newly encoded method of sparse adaptive soft allocated code as weight coefficient, uses the cascade of single order and second order VLAD coding as final character representation on the basis of the existing encoding model of VLAD end to end.Compare NetVLAD model, sparse strategy of the invention and second order indicate to effectively improve the validity of image classification, multipath trains multiple feature coding networks using basic, normal, high levels characteristic simultaneously, more stronger than expression ability of the single-stage feature coding network to characteristics of image.

Description

A kind of image classification method based on the sparse adaptive depth network of second order VLAD

Technical field

The invention belongs to image classifications and deep learning technology field, and in particular to one kind is sparse adaptive based on second order VLAD Answer the image classification method of depth network.

Background technique

Deep learning model achieves excellent performance in computer vision field, and main application direction includes view Feel classification, super-resolution imaging, semantic segmentation, target detection and vision tracking.Compared with traditional statistical learning method, depth Learning model has two major advantages in that (1) can obtain certain computer visual task by training method end to end The weight being more suitable for.(2) from large-scale image data focusing study to deep structure feature can better describe original graph Picture.Compared to traditional manual feature (SIFT feature or HOG feature) method, depth characteristic method can significant ground enhancing Energy.

In view of the huge advantage of end to end model and further feature, nearest a few thing is by conventional statistics learning method Domain knowledge be embedded into deep neural network, and the entire model of training in a manner of end to end.The nerve of these new constructions Network not only inherits domain-specific knowledge, but also all parameters is made to be more suitable for final application task.

Feature coding is a kind of vision sorter statistical learning method of prevalence.In traditional feature coding frame, feature Coding method is the core component of connection features extraction and feature pool, and is influenced on vision sorter performance very big.Popular Feature coding method includes hard coded, soft coding, convolution sparse coding, local restriction coding, local feature Aggregation Descriptor (VLAD) coding etc..In traditional feature coding method all algorithm assemblies (feature extraction, dictionary learning, feature coding and Classifier training) parameter that is all independent from each other, therefore learns may not be optimal for image classification.This Outside, SIFT (Scale invariant features transform) feature used in traditional characteristic coding method cannot indicate image well.Recently, Traditional VLAD coding (NetVLAD) model is extended to the referred to as end to end model of NetVLAD.NetVLAD layers combine depth CNN carries out joint training, to obtain outstanding image classification and image searching result, in addition, NetVLAD model is acting Classification field demonstrates its validity.But existing NetVLAD model is used only the single order from space scale and polymerize letter Breath, the resolving ability of end-to-end feature coding network are not yet sufficiently studied.

Summary of the invention

The present invention in order to overcome existing NetVLAD model, not yet sufficiently grind by the resolving ability of end-to-end feature coding network The shortcomings that studying carefully proposes a kind of image classification method based on the sparse adaptive depth network of second order VLAD.This method is existing On the basis of NetVLAD model, use the newly encoded method of sparse adaptive soft allocated code (SASAC) as weight coefficient, benefit Sparse Adaptive Second-Order VLAD model (SASO-VLADNet) end to end is indicated jointly with single order and second order VLAD coding, from more A convolutional layer extracts convolution feature, passes through the multichannel feature coding network (M-SASO- being made of multiple SASO-VLADNet VLADNet final feature coding) is generated, is lost finally by full articulamentum and loss layer output category.

The purpose of the present invention is realized especially by following technical solution.

A kind of image classification method based on the sparse adaptive depth network of second order VLAD, this method use end-to-end training Multichannel feature coding network, extract nonlinear convolution feature from the subsequent activation primitive of multiple convolutional layers first, then exist It is (sparse adaptive that corresponding sparse Adaptive Second-Order-local feature Aggregation Descriptor SASO-VLAD is calculated in each convolution feature Second order-local feature Aggregation Descriptor) coding, finally summarizes all SASO-VLAD codings, constructs final multipath feature Coding network (M-SASO-VLADNet) is lost by full articulamentum and loss layer output category；The SASO-VLAD coding makes Sparse weight coefficient is obtained with sparse adaptive soft allocated code (SASAC), is encoded using single order and second order VLAD common Indicate sparse Adaptive Second-Order VLAD model (SASO-VLADNet) end to end.

Further, in described sparse adaptive soft this new coding method of allocated code (SASAC), it is sparse from The variant that soft allocated code (SASAC) layer is multidimensional Gaussian probability-density function is adapted to, and adaptive by mode end to end Ground learns all parameters, including dictionary and variance parameter；SASAC layers only retain T maximum probability, and force other small probabilities to be Zero to obtain sparse weight coefficient.

Further, the SASO-VLAD end to end constitutes SASO-VLADNet layers, and network constitutes step are as follows:

Step 3.1: using a specific CNN feature F of convolutional layer_iBy SASAC layers be multiplied to obtain one after dimensionality reduction layer Rank statistical information ξ₁(F_i)；

Step 3.2: ξ₁(F_i) by being normalized after average pond layer by L2 norm, ξ₁(F_i) by two stratum obtain two Rank statistical information ξ₂(F_i) after by L2 norm normalize, normalize to obtain most by L2 norm after connecting two normalized outputs After export；The dimension reduction method is affine subspace method.

Further, the SASAC layers of expression formula are as follows:

Wherein | | | |₂The L2 norm of representation vector,The specific convolutional layer feature of i-th of image of representative model Descriptor set, a shared M descriptor in this descriptor set, f_ij∈R^D×1It is F_iJ-th of descriptor, D representation vector dimension Degree, a_k∈R^D×1,b_k∈R^D×1,v_k∈ R, (k=1,2 ..., K) it is f respectively_ijWeight, f_ijBiasing and it is normalized partially Set, these parameters be all in SASO-VLADNet can training parameter.These parameters one share K group, and k indicates a certain group specific The index of parameter.K' indicates to meet set S_T(f_ij) condition several groups parameter index.

S_T(f_ij) it is the set for meeting following condition:

WhereinIt is S_T(f_ij) complementary set, Card (S_T(f_ij)) it is S_T(f_ij) first prime number.

Further, activation primitive can be one of sigmoid function, tanh function and ReLU function；

Further, the first-order statistics information ξ₁(F_i) expression formula are as follows:

The descriptor set of the specific convolutional layer feature of i-th of image of representative model, in this descriptor set One shared M descriptor, f_ij∈R^D×1It is F_iJ-th of descriptor, D representation vector dimension, λ_ij(k) in claim 4 SASAC layers of code coefficient, U_k,μ_kFor in first-order statistics information dimensionality reduction matrix and biasing, an and shared K group dimensionality reduction matrix And biasing, k indicate the index of specific a certain group of dimensionality reduction matrix and biasing, (U_kf_ij+μ_k) indicate kth group affine subspace layer. Dimensionality reduction matrix and biasing be all in SASO-VLADNet can training parameter.

Further, second-order statistics information ξ₂(F_i) utilize the interaction feature of covariance matrix acquisition interchannel, second-order statistics Information ξ₂(F_i) expression

Formula are as follows:

Wherein vec is the vector operation by matrix conversion for corresponding column vector.

Further, the forward direction operation of the SASO-VLADNet model updates the final loss of depth network first, Then loss is propagated backward into input about the gradient of each parameter to update SASO-VLADNet layers；The classification of the output Loss is the softmax loss of standard.

Further, the multichannel feature coding network (M-SASO-VLADNet) uses basic, normal, high multiple grades simultaneously Convolution feature train multiple feature coding networks.

Further, the parameter updating step of the complete model includes:

Step 1: in each SASO-VLADNet layers of acquisition initiation parameter；

Step 2: being initialized by each SASO-VLADNet coding and final softmax classifier final complete The weight of articulamentum；

Step 3: using above-mentioned initiation parameter and based on training method end to end, the gradient of softmax classifier Information is used to update each layer in M-SASO-VLADNet of parameter until classifier loses curve convergence.

Compared with prior art, what the method for the present invention was proposed is a kind of based on the sparse adaptive depth network of second order VLAD Image classification method has the advantages that

NetVLAD model is compared, sparse strategy of the invention and second order indicate to effectively increase the performance of image classification, more Path trains multiple feature coding networks using basic, normal, high levels characteristic simultaneously, than single-stage feature coding network to image spy The expression ability of sign is stronger.

Detailed description of the invention

Fig. 1 is the flow diagram of the method for the present invention；

Fig. 2 is SASO-VLADNet layers in the method for the present invention of network structure；

Fig. 3 is M-SASO-VLADNet network structure in the method for the present invention.

Specific embodiment

In order to clearly demonstrate the objectives, technical solutions, and advantages of the present invention, with reference to the accompanying drawings and embodiments, to this hair It is bright to be further elaborated.If being this field it is noted that having the process or symbol of not special detailed description below Technical staff can refer to the prior art realize or understand.It should be understood that specific embodiment described herein is only To explain the present invention, it is not construed as the scope of protection of the patent of the present invention, the present invention is using appended claims as patent protection model It encloses.In addition, as long as technical characteristic involved in each embodiment of invention described below is not constituted each other Conflict can be combined with each other.

As shown in Figure 1, a kind of image classification method based on the sparse adaptive depth network of second order VLAD includes following step It is rapid:

Step 1: image being pre-processed using depth convolutional Neural net, chooses specific L=4 convolutional layer, is extracted every A convolutional layer is by the feature after activation primitive as L=4 input vector；

Specifically, the single-stage feature of SASO-VLADNet and the multistage of M-SASO-VLADNet are extracted using VGG-VD network Feature, for SASO-VLADNet, extracted single-stage feature is the feature of the relu5_3 convolutional layer of VGG-VD network, For M-SASO-VLADNet, extracted multi-stage characteristics are the relu5_1, relu5_2, relu5_3 of VGG-VD network With the feature of pool5 this 4 convolutional layers.The size of all images is adjusted to 448 × 448 pixels, with random cropping technology and with Machine mirror image technology enhances image, and using flexible efficient deep learning library Mxnet realizes depth CNN feature extraction.

Specifically, activation primitive is one of sigmoid function, tanh function and ReLU function.

Step 2: as shown in Fig. 2, some specific convolutional layer feature is (in relu5_1, relu5_2, relu5_3 and pool5 One) SASO-VLADNet coding calculating process it is as follows:

Step 2.1: using the feature (one in relu5_1, relu5_2, relu5_3 and pool5 of a specific convolutional layer It is a) F_iBy being multiplied to obtain first-order statistics information ξ after sparse adaptive soft allocated code (SASAC) layer and dimensionality reduction layer₁(F_i)；

Step 2.2: ξ₁(F_i) by being normalized after average pond layer by L2 norm, ξ₁(F_i) by two stratum obtain two Rank statistical information ξ₂(F_i), ξ₂(F_i) normalized by L2 norm, two normalized outputs are connected, are normalized using L2 norm Obtain SASO-VLADNet layers of output.

Specifically, for SASO-VLADNet, by VGG-VD netinit, which is the depth CNN of front end Obtained from extensive ImageNet data set pre-training, then, using a specific CNN feature (relu5_1, relu5_2, One in relu5_3 and pool5) come learn initialize dictionaryDictionary is initialized by the K-means in the library VLFeat Algorithm obtains.In SASO-VLADNet model, the general K=128 that chooses can obtain performance good enough, therefore K=is arranged 128。

Step 3:The descriptor set of the specific convolutional layer feature of i-th of image of representative model, f_ij∈R^D×1It is F_i J-th of descriptor, the number of active lanes of D representation vector dimension namely convolution feature is last several layers of for VGG-VD network The number of active lanes of convolution feature is 512, so in SASO-VLADNet, D=512.

The SASAC layer expression formula of neotectonics in SASO-VLADNet layers are as follows:

S_T(f_ij) it is the set for meeting following condition:

Specifically, the value of T maximum value of SASAC layers of holding, T cannot be too big or too small, and specific T value is by cross validation It determines.By relevant experimental verification, for simplicity generally setting T=5.

Step 4: carrying out dimensionality reduction using affine subspace method；

Affine subspace layer in SASO-VLADNet are as follows: R_k=U_k(f_ij-c_k)=(U_kf_ij+μ_k)

Wherein μ_k=-U_kc_k∈R^P×1,U_k∈R^P×D(k=1,2 ..., k) is the dimensionality reduction projection square in affine subspace method Battle array, P is subspace dimension.P determines final characteristic length, in order to enable mark sheet is shown with relatively small dimension and has enough Good performance, generally setting P=128.

First-order statistics information ξ₁(F_i) expression are as follows:

Specifically, (U_kf_ij+μ_k) convolution weight can be regarded as U_k, it is biased to μ_k1 × 1 convolutional layer, using traditional CNN training method efficiently trains affine subspace layer end to end.

Step 5: second-order statistics information ξ₂(F_i) expression formula are as follows:

Specifically, it is indicated using the interaction that the covariance matrix of single order feature obtains feature interchannel, due to second-order statistics Information can be micro-, so second-order statistics Information Level can be trained by mode end to end.

Step 6: since affine subspace layer and second-order statistic layer can be trained with existing end-to-end method, And SASAC layers be a brand new network layer, provide SASAC layers of specific backpropagation function here carry out end arrive The training at end:

6.1:SASAC layers of expression formula of step are equivalent to three expression formulas to each k (k=1,2 ..., K):

Second expression formula of SASAC layers of equivalent expression can regard a kind of mutation of maximum pond layer as, which protects T maximum value is held, forcing remaining value is 0；Third expression formula is that normalization layer obtains normalized weight coefficient.

Step 6.2: to each k, the gradient that final Classification Loss J is exported relative to SASAC layers isBased on chain Formula rule obtains γ_ij(k) and β_ij(k) pressure gradient expression formula are as follows:

Step 6.3: being based on the β_ij(k) second expression of (k=1,2 ..., K) and SASAC layers of equivalent expression group Formula can obtain loss J relative to f_ijPressure gradient expression formula:

Step 6.4: being based on the β_ij(k) second expression of (k=1,2 ..., K) and SASAC layers of equivalent expression group Formula can obtain loss J relative to a_k,b_k,v_kPressure gradient expression formula:

Step 7: after inputting pretreated image, the convolution feature F of the specific convolutional layer of available i-th of picture_i, F_i SASO-VLAD (sparse Adaptive Second-Order-local feature Aggregation Descriptor) indicate final expression formula are as follows:

Wherein, L2norm is the L2 norm method for normalizing an of vector, a_k,b_k,v_k,U_k,μ_k(k=1,2 ..., k) be In SASO-VLADNet can training parameter.

Specifically, a_k,b_k,v_k,U_k,μ_k(k=1,2 ..., K) these parameters are to learn to obtain by mode end to end 's.

In parameter renewal process in SASO-VLADNet, pass through the preceding final damage that depth network is updated to operation first It loses, loss is then propagated backward into input about the gradient of each parameter to update entire SASO-VLADNet model.

Step 8: encoding (relu5_1, relu5_2, relu5_3 and pool5 convolution when obtaining L=4 SASO-VLADNet The coding that feature generates) after, this 4 codings are cascaded up to obtain final M-SASO-VLADNet coding, as shown in Figure 3. M-SASO-VLADNet coding obtains Classification Loss by final full articulamentum, loss layer, and loss layer is the softmax of standard Loss, is denoted as:

Wherein, C is classification quantity, and 1 { } was indicator function, 1 { a true state }=1,1 { a false state }= 0, y_iRepresent the class label of i-th of image, ρ_icIt is L=4 SASO-VLADNet (by relu5_1, relu5_2, relu5_ 4 SASO-VLADNet coding that 3 and pool5 is generated) whole prediction scores:

Wherein,WithIt is a full connection (FC) layer of l (l=1,2 ..., L) Weight and biasing.

Specifically, ρ_icIt further indicates that are as follows: ρ_ic=(G_c)^T[ξ(F_i ⁽¹⁾)；ξ(F_i ⁽²⁾)；…ξ(F_i ^(L))]+(B_c)^T

To trained SASO-VLADNet and M-SASO-VLADNet in destination image data collection (Caltech256 data Collection), it tests in fine granularity image data set (CUB200 data set, StandFord Car data set) and texture image dataset Their image classification performance, compared to NetVLAD model, SASO-VLADNet promotes the image recognition rate of 2-4%.And institute The multi-channel network (M-SASO-VLADNet) of proposition improves 1% or so than proposed one-way network (SASO-VLADNet) Image recognition rate.

Step 9: the complete parameter updating step based on the sparse adaptive depth network of second order VLAD includes:

Step 9.1: in each SASO-VLADNet layers of acquisition initiation parameter；

Step 9.2: final to initialize by each SASO-VLADNet coding and final softmax classifier The weight of full articulamentum；

Step 9.3: using above-mentioned initiation parameter and based on training method end to end, the ladder of softmax classifier Degree information is used to update each layer in M-SASO-VLADNet of parameter until classifier loses curve convergence.

Claims

1. a kind of image classification method based on the sparse adaptive depth network of second order VLAD, which is characterized in that using end-to-end Trained multichannel feature coding network extracts nonlinear convolution feature from the subsequent activation primitive of multiple convolutional layers first, so Corresponding sparse Adaptive Second-Order-local feature Aggregation Descriptor SASO-VLAD coding is calculated in each convolution feature afterwards, most Summarize all SASO-VLAD codings afterwards, final multipath feature coding network M-SASO-VLADNet is constructed, by connecting entirely Connect layer and the loss of loss layer output category；The SASO-VLAD coding is obtained using sparse adaptive soft allocated code SASAC Sparse weight coefficient encodes common expression sparse Adaptive Second-Order VLAD model end to end using single order and second order VLAD SASO-VLADNet。

2. a kind of image classification method based on the sparse adaptive depth network of second order VLAD according to claim 1, It is characterized in that, in described sparse this new coding method of adaptive soft allocated code SASAC, sparse adaptive soft distribution SASAC layers of coding is the variant of multidimensional Gaussian probability-density function, and adaptively learns all ginsengs by mode end to end Number, including dictionary and variance parameter；SASAC layers only retain T maximum probability, and forcing other small probabilities is zero sparse to obtain Weight coefficient.

3. a kind of image classification method based on the sparse adaptive depth network of second order VLAD according to claim 1, It is characterized in that, the SASO-VLAD end to end constitutes SASO-VLADNet layers, and network constitutes step are as follows:

Step 3.1: using a specific CNN feature F of convolutional layer_iBy SASAC layers be multiplied to obtain first-order statistics after dimensionality reduction layer Information ξ₁(F_i)；

Step 3.2: ξ₁(F_i) by being normalized after average pond layer by L2 norm, ξ₁(F_i) by two stratum obtain second-order statistics Information ξ₂(F_i) after by L2 norm normalize, normalize to export to the end by L2 norm after connecting two normalized outputs； The dimension reduction method is affine subspace method.

4. a kind of image classification method based on the sparse adaptive depth network of second order VLAD according to claim 2 or 3, It is characterized in that, the SASAC layers of expression formula are as follows:

Wherein | | | |₂The L2 norm of representation vector,The description of the specific convolutional layer feature of i-th of image of representative model Symbol collects, a shared M descriptor in this descriptor set, f_ij∈R^D×1It is F_iJ-th of descriptor, D representation vector dimension, a_k∈R^D×1,b_k∈R^D×1,v_k∈ R, (k=1,2 ..., K) it is f respectively_ijWeight, f_ijBiasing and normalized biasing, this A little parameters be all in SASO-VLADNet can training parameter；These parameters one share K group, and k indicates specific a certain group of parameter Index；K' indicates to meet set S_T(f_ij) condition several groups parameter index；

S_T(f_ij) it is the set for meeting following condition:

5. a kind of image classification method based on the sparse adaptive depth network of second order VLAD according to claims 1 and 2, It is characterized in that, activation primitive is one of sigmoid function, tanh function and ReLU function.

6. a kind of image classification method based on the sparse adaptive depth network of second order VLAD according to claim 4, It is characterized in that, the first-order statistics information ξ₁(F_i) expression formula are as follows:

The descriptor set of the specific convolutional layer feature of i-th of image of representative model, in this descriptor set altogether There are M descriptor, f_ij∈R^D×1It is F_iJ-th of descriptor, D representation vector dimension, λ_ij(k) code coefficient for being SASAC layers, U_k,μ_kFor in first-order statistics information dimensionality reduction matrix and biasing, and a shared K group dimensionality reduction matrix and biasing, k indicate specific The index of a certain group of dimensionality reduction matrix and biasing, (U_kf_ij+μ_k) indicate kth group affine subspace layer；Dimensionality reduction matrix and biasing are all In SASO-VLADNet can training parameter.

7. a kind of image classification method based on the sparse adaptive depth network of second order VLAD according to claim 4, It is characterized in that, second-order statistics information ξ₂(F_i) utilize the interaction feature of covariance matrix acquisition interchannel, second-order statistics information ξ₂ (F_i) expression formula are as follows:

8. a kind of image classification method based on the sparse adaptive depth network of second order VLAD according to claim 4, It is characterized in that, the forward direction operation of the SASO-VLADNet model updates the final loss of depth network first, then will damage It loses and propagates backward to input about the gradient of each parameter to update SASO-VLADNet layers；The Classification Loss of the output is mark Quasi- softmax loss.

9. a kind of image classification method based on the sparse adaptive depth network of second order VLAD according to claim 1, It is characterized in that, the multichannel feature coding network trains multiple features using the convolution feature of basic, normal, high multiple grades simultaneously Coding network.

10. a kind of image classification method based on the sparse adaptive depth network of second order VLAD according to claim 1, The parameter updating step for being characterized in that the complete model includes:

Step 1: in each SASO-VLADNet layers of acquisition initiation parameter；

Step 2: final full connection is initialized by each SASO-VLADNet coding and final softmax classifier The weight of layer；

Step 3: using above-mentioned initiation parameter and based on training method end to end, the gradient information of softmax classifier For updating each layer in M-SASO-VLADNet of parameter until classifier loses curve convergence.