CN111488907B

CN111488907B - Robust image recognition method based on dense PCANet

Info

Publication number: CN111488907B
Application number: CN202010147376.0A
Authority: CN
Inventors: 李小薪; 徐晨雅; 胡海根; 周乾伟; 郝鹏翼
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2020-03-05
Filing date: 2020-03-05
Publication date: 2023-07-14
Anticipated expiration: 2040-03-05
Also published as: CN111488907A

Abstract

A robust image recognition method based on dense PCANet comprises two steps of robust feature extraction and nearest neighbor classification based on chi-square distance. The robust feature extraction process uses dense connection of feature graphs and dense coding of pattern graphs, wherein the dense connection is to combine the features output by all convolution layers to form wider convolution layer features; dense coding, i.e. when using convolutional layers for pattern coding, uses smaller jump amplitudes so that the pattern map reflects as much as possible the correlation between the feature maps. The classification process comprises the following steps: step 1, acquiring distance measurement from an image to be identified to each training image based on chi-square distance in a high-dimensional histogram feature space; and 2, obtaining a class mark corresponding to the training sample with the minimum distance measurement as the class mark of the image to be identified. The method and the device can effectively process the changes such as shielding, illumination change, resolution difference and the like in the image to be identified, thereby effectively improving the identification rate of the offset image.

Description

Robust image recognition method based on dense PCANet

Technical Field

The invention relates to the field of image processing and pattern recognition, in particular to robust image recognition with large difference between an image to be recognized and a training image, which is mainly used for processing and recognizing images in reality.

Background

Recently, in the field of computer vision and image recognition, deep neural networks (Deep Neural Network, DNN), represented by convolutional neural networks (Convolutional Neural Networks, CNN), have met with great success, and on some disclosed data sets, the classification capabilities of leading edge deep learning methods even exceed those of humans, for example: authentication accuracy on LFW face database, image classification accuracy on ImageNet, and handwriting digital recognition accuracy on MNIST, etc. However, in practice, the image to be identified tends to have a large difference in "distribution" or "structure" from the training image, which can cause DNN to suffer from a large-scale recognition error, which phenomenon is called "Covariate Shift" in the field of deep learning.

Disclosure of Invention

In order to overcome the defect of low image recognition rate caused by covariate offset of the existing image recognition method, the invention provides a robust image recognition method based on Dense PCANet (DPCANET). DPCANET can effectively solve the recognition problem caused by covariate offset, and particularly can greatly improve the image recognition performance when the images to be recognized have offset with larger amplitude such as shielding, illumination change, resolution difference and the like.

The technical scheme adopted for solving the technical problems is as follows:

a robust image recognition method based on dense PCANet, comprising the steps of:

step 1 selecting J images A= { A ₁ ,…,A _J As training set, the corresponding class label is

Y＝{Y ₁ ,…,Y _K The number is the set of images to be identified, i.e. the test set, here +.>

Respectively represent the C on the real number domain ₀ An image with length and width of m x n of E {1,3} channels;

step 2, initializing parameters and input data: order the

Here, a->

For indicating the stage at which the network is located,

indicating that the network is in training phase->

Indicating that the network is in a testing stage; let l=0, where l is used to indicate the number of layers of the input image or feature map in the network,/>

Wherein n=j,>

let f= { F ₁ ,…,F _N The symbol "represents a set of feature maps generated by the convolutional layers, where +.>

Representing an empty set;

step 3 consists of

Construction of matrix->

Wherein, the liquid crystal display device comprises a liquid crystal display device,

is->

Mean value of->

Representing from->

B e {1,2, …, mn } feature blocks of size k×k extracted from the c-th channel, vec (·) represents the operation of stretching the matrix into column vectors;

step 4 if

Indicating that the network is in a testing stage, jumping to the step 7, otherwise, executing the next step;

step 5 calculation

Main direction->

Wherein (1)>

Is covariance matrix->

The i' th eigenvector of (a), the corresponding eigenvalue is lambda _i′ And->

Step 6 from V ^(l ) Acquisition of C _l+1 Individual channel dependent filter bank

Step 7 the feature atlas X of the (1) th convolution layer is calculated as follows ^(l+1) :7.1 Will) be

Projected to

7.2 Will->

The elements in (a) are reorganized into feature atlas +.>

Wherein (1)>

And is also provided with

Here, a->

Representation->

Column vectors from row a to row b of column c, a% b representing a remainder of b, +.>

Representation rounding down the real number a, mat _m×n (v) represents that an arbitrary column vector is +.>

Rearranging into an m×n matrix;

step 8 feature atlas X ^(l+1) Incorporated in F:

step 9, let l=l+1, execute the above steps 3 to 8 until l=l, where L represents a predetermined maximum convolution layer number;

step 10, performing dense coding on the feature atlas F to obtain a mode atlas P: p= { P _i, β} _{i＝1,…,N；β＝1,…,B} Wherein, the method comprises the steps of, wherein,

beta e {1, …, B } pattern diagram representing the ith sample, F _i,· Representing feature map subset F _i In>

T represents the number of channels participating in the encoding of a single pattern, 1.ltoreq.τ.ltoreq.T, USF (. Cndot.) represents a unit step function (Unit Step Function, USF), and the input value is binarized by comparison with 0, i.e.:

step 11 extracts histogram features H from the pattern atlas P: h= [ H ] _i ] _i＝1,…,N Wherein H is _i ＝[H _i,1 ,…,H _i,B ] ^T ，H _i,β ＝Qhist(P _i,β )，Qhist(P _i,β ) Representing the pattern diagram P _i,β Divided into Q blocks, a histogram is extracted from each block, each histogram using 2 ^T The number of packets, i.e. the code value of the statistical pattern diagram is 2 for each feature block ^T The frequency of occurrence in the individual packets;

step 12 if

Make H ^Te =h, jump to step 14; otherwise, let H ^Tr =h, perform the next step;

step 13 order

l＝0，/>

Wherein n=k,>

executing the steps 3 to 11;

step 14 calculates a metric matrix m= [ M ] _i,j ] _{i＝1,…,J；j＝1,…,K} Wherein, the method comprises the steps of, wherein,

here the number of the elements is the number,

wherein D represents

And->

Length of->

Representation->

The d element of (a)>

Representation of

The d element of (a);

step 15, calculating class id= [ Id ] of each sample in the test set Y _i ] _i＝1,…,K ：

Wherein M is _i Represents the ith column vector in the metric matrix M, minIndx (·) represents M _i Index of the smallest element in the (c).

Further, in the step 7, the feature atlas X of the (1+1) th convolution layer is calculated as follows ^(l+1) ：

7.1 Will) be

Projection to +.>

7.2 Will) be

The elements in (a) are reorganized into feature atlas +.>

Wherein (1)>

And is also provided with

c＝j％C _l+1 The method comprises the steps of carrying out a first treatment on the surface of the Here, a->

Representation->

Columns from row a to row b of column cThe vector, a% b, represents a to b remainder,

Rearranged into an mxn matrix.

The technical conception of the invention is as follows: when the images to be identified and the training set images have large offsets such as shielding, illumination change, resolution difference and the like, the identification performance of the existing neural network model is often greatly reduced, and PCANet can better solve the problems. However, PCANet does not fully exploit the learned features: (1) PCANet only uses the feature map output by the last convolution layer to generate subsequent pattern map and histogram features; (2) When the PCANet performs mode encoding, the PCANet uses a large jump, and the correlation between the feature maps cannot be fully utilized. In order to solve the problems, insanenet inspired by the invention, dense connection and dense coding are introduced into a network model of PCANet so as to enrich the extracted features of the PCANet as much as possible, thereby improving the robustness of the PCANet. Dense connection, i.e.: combining all the characteristics output by the convolution layers to form wider convolution layer characteristics; dense coding, i.e.: in pattern coding using convolutional layers, smaller jump amplitudes are used so that the pattern reflects correlation between feature patterns as much as possible.

The beneficial effects of the invention are mainly shown in the following steps: the method can more effectively process the changes such as shielding, illumination change, resolution difference and the like in the image to be identified, thereby effectively improving the identification rate of the offset image.

Drawings

Fig. 1 is a feature map extraction process of dense PCANet according to the present invention, wherein,

a convolution operator is represented, and the step 7 of the invention content is detailed; the U-shaped representation performs dense connection on the feature mapA union operator; />

The method comprises the following steps of (1) extracting the block histogram characteristics of a pattern diagram, and referring to step 11 of the invention content in detail;

FIG. 2 is a classification process of dense PCANet according to the present invention;

FIG. 3 is a test set and training set sample from an AR face database, where (a) is a test set I sample, (b) is a test set II sample, (c) is a test set III sample, and (d) is a training set sample;

FIG. 4 is a process for extracting feature blocks from a feature map, where (a) is the original feature map, (b) is boundary zero padding, (c) is feature block selection, and (d) is the selected multi-channel feature block;

FIG. 5 (a) is a schematic diagram of the Vec (·) operator stretching the matrix into column vectors, FIG. 5 (b) is mat _m×n An (-) operator resets the column vector to a schematic of the matrix;

fig. 6 is a pattern diagram generated by four networks, (a) represents PCANet, (b) represents DPCANet-1, (c) represents DPCANet-2, (d) represents DPCANet-3, where DPCANet-1 represents DPCANet employing only dense linking to the feature diagram, DPCANet-2 represents DPCANet employing only dense encoding to the pattern diagram, and DPCANet-3 represents DPCANet employing both dense linking to the feature diagram and dense encoding to the pattern diagram.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1 to 6, a robust image recognition method based on Dense PCANet (DPCANet), the method comprising the steps of:

Respectively represent the C on the real number domain ₀ Image with length width m×n of E {1,3} channels, specifically, C ₀ =1 represents a gray scale image, C ₀ =3 denotes RGB image, fig. 3 shows a training set sample from AR face database and three sample subsets of images to be identified;

step 2, initializing parameters and input data: order the

Here, a->

For indicating the stage at which the network is located,

indicating that the network is in training phase->

Wherein n=j,>

Representing an empty set;

step 3 consists of

ConstructionMatrix->

is->

Mean value of->

Representing from->

B e {1,2, …, mn } feature blocks of size k×k extracted from the c-th lane, vec (·) represents the operation of stretching the matrix into column vectors. Here, the size of the selected feature block, i.e., the size of the learned PCA filter, is typically taken as k=3 or k=5; when selecting the feature blocks, it should be noted that the number of the feature blocks selected in the single feature map on the single channel should be equal to the size m×n of the feature map, in order to achieve this, the feature blocks need to be selected downward and rightward with 1 as an interval, and the boundaries of the feature map need to be filled with 0, as shown in fig. 4 (a) - (b); FIG. 5 (a) gives an example of the operation of the Vec (-) operator;

step 4 if

If the network is in the test stage, jumping to the step 7, otherwiseThen, executing the next step;

step 5 calculation

Main direction->

Wherein (1)>

Is covariance matrix->

Step 6 from V ^(l) Acquisition of C _l+1 Individual channel dependent filter bank

Projected to

7.2 Will->

The elements in (a) are reorganized into feature atlas +.>

Wherein (1)>

And is also provided with

Here, a->

Representation->

Rearranged into an mxn matrix, FIG. 5 (b) shows mat _m×n An example of a (-) operation;

step 8 feature atlas X ^(l+1) Incorporated in F:

step 9, let l=l+1, execute the above steps 3 to 8 until l=l, where L represents a predetermined maximum convolution layer number; usually we can take L.epsilon.2, 3;

step 10, performing dense coding on the feature atlas F to obtain a mode atlas P: p= { P _i,β } _{i＝1,…,N；β＝1,…,B} Wherein, the method comprises the steps of, wherein,

T represents the number of channels involved in the encoding of a single pattern, typically set t=8, τ(1. Ltoreq.τ.ltoreq.T) for controlling the step size at intervals when the feature map is acquired, typically τ=T/2, USF (·) represents a unit step function (Unit Step Function, USF), and the input value is binarized by comparison with 0, i.e.:

step 12 if

Make H ^Te =h, jump to step 14; otherwise, let H ^Tr =h, perform the next step;

step 13 order

l＝0，/>

Wherein n=k,>

executing the steps 3 to 11;

here the number of the elements is the number,

wherein D represents

And->

Length of->

Representation->

The d element of (a)>

Representation of

The d element of (a);

Table 1 for the training and test sets given in fig. 3, the recognition rates of three versions of DPCANet (DPCANet-1, DPCANet-2, DPCANet-3) were compared with the existing method (VGG-Face, LCNN, PCANet), as can be seen: DPCANet-1 to DPCANet-3 all show better performance than PCANet, but each of the performance of DPCANet-1 and DPCANet-2 has better or worse, and DPCANet-3 has the optimal recognition performance, especially when the resolution of the image to be recognized is lower, the advantage is more remarkable.

Table 1.

Claims

1. A robust image recognition method based on dense PCANet, the method comprising the steps of:

step 2, initializing parameters and input data: order the

Here, a->

For indicating the stage in which the network is located, +.>

Indicating that the network is in training phase->

Wherein n=j,>let f= { F ₁ ,…,F _N The symbol "represents a set of feature maps generated by the convolutional layers, where +.>

Representing an empty set;

step 3 consists of

Construction of matrix->

is->

Mean value of->

Representing from->

Is the c-th pass of (2)B epsilon {1,2, …, mn } feature blocks of size k×k extracted in the trace, vec (·) represents the operation of stretching the matrix into column vectors;

step 4 if

Indicating that the network is in a testing stage, jumping to the step 7, otherwise, executing the steps 5-6;

step 5 calculation

Main direction->

Wherein (1)>

Is covariance matrix->

C _l+1 ≤k ² ；

Step 7 computing the feature atlas X of the 1+1th convolution layer ^(l+1) ；

Step 8 feature atlas X ^(l+1) Incorporated in F:

step 10, performing dense coding on the feature atlas F to obtain a mode atlas P: p= { P _i，β } _{i＝1，...，N；β＝1，...，B} Wherein, the method comprises the steps of, wherein,

beta e {1, …, B } pattern diagram representing the ith sample, F _i, Representing feature map subset F _i In>

T represents the number of channels participating in the encoding of a single pattern, τ is the step size of the interval used for controlling the acquisition of the feature map, 1 is less than or equal to τ is less than or equal to T, USF (·) represents a unit step function, and the input value is binarized by comparing with 0, namely:

step 12 if

Make H ^Te =h, jump to step 14; otherwise, let H ^Tr =h, perform the next step;

step 13 order

l＝0，/>

Wherein n=k,>

executing the steps 3 to 11;

here the number of the elements is the number,

wherein D represents

And->

Length of->

Representation->

The d element of (a)>

Representation->

The d element of (a);

2. The robust image recognition method based on dense PCANet as recited in claim 1, wherein in said step 7, the feature atlas X of the l+1th convolution layer is calculated as follows ^(l+1) ：

7.1 Will) be

Projected to W ^(l+1) ：/>

7.2 Will) be

Reorganizing elements in a feature atlas X ^(l+1) ：/>

and is also provided with

Representation->

Column vectors from rows a to b of column c, a% b representing a-to-b remainder,

Rearranged into an mxn matrix.