CN111488905B

CN111488905B - Robust image recognition method based on high-dimensional PCANet

Info

Publication number: CN111488905B
Application number: CN202010147000.XA
Authority: CN
Inventors: 李小薪; 徐晨雅; 胡海根; 周乾伟; 郝鹏翼
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2020-03-05
Filing date: 2020-03-05
Publication date: 2023-07-14
Anticipated expiration: 2040-03-05
Also published as: CN111488905A

Abstract

A robust image recognition method based on high-dimensional PCANet comprises robust feature extraction and nearest neighbor classification based on chi-square distance, wherein the robust feature extraction process combines flat convolution and three-dimensional convolution of a feature map, the three-dimensional convolution fully considers correlation among channels, the flat convolution can fully decompose main directions of each channel of an input image, and the obtained pattern map has richer features compared with the original PCANet and can effectively improve the robustness of the PCANet; the classification process comprises the following steps: step 1, acquiring distance measurement from an image to be identified to each training image based on chi-square distance in a high-dimensional histogram feature space; and 2, obtaining a class mark corresponding to the training sample with the minimum distance measurement as the class mark of the image to be identified. The method and the device can effectively process the changes such as shielding, illumination change, resolution difference and the like in the image to be identified, and effectively improve the identification rate of the offset image.

Description

Robust image recognition method based on high-dimensional PCANet

Technical Field

The invention relates to the field of image processing and pattern recognition, in particular to robust image recognition with large difference between an image to be recognized and a training image, which is mainly used for processing and recognizing images in reality.

Background

In the existing field of computer vision and image recognition, deep neural networks (Deep Neural Network, DNN), represented by convolutional neural networks (Convolutional Neural Networks, CNN), have met with great success, and on some disclosed data sets, the classification capabilities of leading edge deep learning methods even exceed those of humans, for example: authentication accuracy on LFW face database, image classification accuracy on ImageNet, and handwriting digital recognition accuracy on MNIST, etc. However, in practice, the image to be identified tends to have a large difference in "distribution" or "structure" from the training image, which can cause DNN to suffer from a large-scale recognition error, which phenomenon is called "Covariate Shift" in the field of deep learning. The covariate offset causes the technical defects of lower accuracy and poor feasibility of the existing image recognition method.

Disclosure of Invention

In order to overcome the defects of low image recognition accuracy and poor feasibility caused by the existing covariate offset, the invention provides a robust image recognition method with High accuracy and good feasibility based on High-dimensional PCANet (HPCANet), which can effectively overcome the recognition problem caused by the covariate offset, and particularly can greatly improve the image recognition performance when the images to be recognized have offset with larger amplitudes such as shielding, illumination change, resolution difference and the like.

The technical scheme adopted for solving the technical problems is as follows:

a robust image recognition method based on high-dimensional PCANet comprises the following steps:

step 1 selecting J images A= { A ₁ ,…,A _J As training set, the corresponding class label is

Z＝{Z ₁ ,…,Z _K The number is the set of images to be identified, i.e. the test set, here +.>

Respectively represent the C on the real number domain ₀ An image with length and width of m x n of E {1,3} channels;

step 2, initializing parameters and input data: order the

Here, a->

For indicating the stage at which the network is located,

indicating that the network is in training phase->

Indicating that the network is in a testing stage; let l=0, where l is used to indicate the number of layers of the input image or feature map in the network,/>

Wherein (1)>

N＝J；

Step 3 consists of

Construction of matrix->

Wherein,,

is->

Mean value of->

Representing from->

B e {1,2, …, mn } feature blocks of size k×k extracted from the c-th channel, vec (·) represents the operation of stretching the matrix into column vectors;

step 4 if

If the network is in the test stage, jumping to the step 7, otherwise, executing the next step;

step 5 calculation

Main direction->

Wherein (1)>

Is covariance matrix->

The i' th eigenvector of (a), the corresponding eigenvalue is lambda _i′ And->

Step 6 from V ^(l) Acquisition of C _l+1 Three-dimensional filter bank

Step 7 computing the feature atlas X of the 1+1th convolution layer ^(l+1) ；

Step 8, let l=l+1, execute the above steps 3 to 7 until l=l, where L represents the maximum convolution layer number given in advance;

step 9, initializing parameters and input data: order the

l＝0，/>

Wherein Y is _i ^(l) ＝A _i ，N＝J；

Step 10 consists of

Construction of matrix->

Wherein (1)>

Is->

Is used for the average value of (a),

representing the slave Y _i ^(l) B e {1,2, …, mn } feature blocks of size k x k extracted from the c-th channel;

step 11 if

Then jump to step 14, otherwise, execute the next step;

step 12 calculation

Main direction->

Wherein (1)>

Is covariance matrix->

The i "th eigenvector of (a), the corresponding eigenvalue is lambda _i″ And->

Step 13 consists of

Acquisition of C _l+1 Three-dimensional filter bank

Step 14 calculates a feature atlas Y for the 1+1th convolution layer ^(l+1) ；

Step 15, let l=l+1, execute the above steps 10 to 14 until l=l;

step 16 sets the feature atlas X ^(L) And Y ^(L) Combining to form a new feature atlas F:

step 17, performing pattern diagram coding on the feature atlas F to obtain a pattern atlas P: p= { P _i,β } _{i＝1,…,N；β＝1,…,B} Wherein, the method comprises the steps of, wherein,

beta e {1, …, B } pattern diagram representing the ith sample, F _i,· Representing feature map subset F _i In>

T represents the number of channels involved in the encoding of a single pattern diagram, USF (·) represents a unit step function (Unit Step Function, USF), and the input value is binarized by comparison with 0, i.e.:

step 18 extracts histogram features H from the pattern atlas P: h= [ H ] _i ] _i＝1,…,N Wherein H is _i ＝[H _i,1 ,…,H _i,B ] ^T ，H _i,β ＝Qhist(P _i,β )，Qhist(P _i,β ) Representing a pattern diagramP _i,β Divided into Q blocks, a histogram is extracted from each block, each histogram using 2 ^T The number of packets, i.e. the code value of the statistical pattern diagram is 2 for each feature block ^T The frequency of occurrence in the individual packets;

step 19 if

Make H ^Te =h, jump to step 21; otherwise, let H ^Tr =h, perform the next step;

step 20 order

l＝0，/>

Wherein n=k,>

executing the steps 3 to 19;

step 21 calculates a metric matrix m= [ M ] _i,j ] _{i＝1,…,J；j＝1,…,K} Wherein, the method comprises the steps of, wherein,

here the number of the elements is the number,

wherein D represents

And->

Length of->

Representation->

The d element of (a)>

Representation of

The d element of (a);

step 22 calculates class id= [ Id ] of each sample in the test set Z _i ] _i＝1,…,K ：

Wherein M is _i Represents the ith column vector in the metric matrix M, minIndx (·) represents M _i Index of the smallest element in the (c).

Further, in the step 7, the feature atlas X of the (1) th convolution layer is calculated according to the following steps ^(l+1) :7.1 Will) be

Projected to W ^(l+1) ：/>

7.2 Will->

The elements in (a) are reorganized into feature atlas +.>

Wherein (1)>

And is also provided with

Here, a->

Representation->

Column vectors of rows a to b of column c, a% b representing a remainder of b,/-a%>

Representation rounding down the real number a, mat _m×n (v) represents that an arbitrary column vector is +.>

Rearranged into an mxn matrix.

Still further, in the step 14, the feature atlas Y of the (1+1) th convolution layer is calculated as follows ^(l+1) :14.1 Will) be

Projected to W ^(l+1) ：/>

14.2 Will->

The elements in (a) are reorganized into feature atlas +.>

Wherein (1)>

And->

Here the number of the elements is the number,

the representation connects the matrices in the set in the channel direction.

The technical conception of the invention is as follows: when the images to be identified and the training set images have large offsets such as shielding, illumination change, resolution difference and the like, the identification performance of the existing neural network model is often greatly reduced, and PCANet can better solve the problems. However, PCANet suffers from two drawbacks: (1) The PCANet adopts flat convolution, and correlation among channels of the feature map is not fully considered; (2) PCANet compresses the generated feature map 8 times when encoding the pattern map, so that the acquired pattern map lacks rich discriminative features. In order to solve the above problems, the invention combines the stereo convolution and the flat convolution, wherein the stereo convolution can fully consider the correlation between channels, and the flat convolution can fully decompose the main direction of each channel of the input image, so the obtained pattern diagram has richer features compared with the original PCANet, and the robustness of the PCANet can be effectively improved.

The beneficial effects of the invention are mainly shown in the following steps: the method can more effectively process the changes such as shielding, illumination change, resolution difference and the like in the image to be identified, thereby effectively improving the identification rate of the offset image.

Drawings

Fig. 1 is a feature map extraction process of the high-dimensional PCANet according to the present invention, wherein,

a flat convolution operation is shown, and the details of the step 7 are shown in the invention content; the U represents merging the feature map subsets; />

The block histogram feature extraction of the pattern diagram is shown, and the step 18 of the invention is detailed;

FIG. 2 is a classification process of the high-dimensional PCANet of the present invention, see step 21 and step 22 of the summary, wherein NN represents the nearest neighbor classifier, id represents the final class of the image to be identified;

FIG. 3 is a training set sample and a test set sample from an AR face database, where (a) a sample of test set I, (b) a sample of test set II, (c) a sample of test set III, and (d) a sample of training set;

FIG. 4 (a) is a process of stretching a matrix into column vectors by Vec (·) operator, and FIG. 4 (b) is mat _m×n A process in which the (-) operator resets the column vector to a matrix;

FIG. 5 is a process diagram of extracting feature blocks from a feature map in a flat convolution, where (a) is the original feature map, (b) is boundary zero padding, (c) is feature block selection, and (d) is the selected multi-channel feature block;

FIG. 6 is a process diagram of extracting feature blocks from a feature map in a stereo convolution, where (a) is the original feature map, (b) is boundary zero padding, (c) is feature block selection, and (d) is the selected multi-channel feature block;

fig. 7 (a) is a one-dimensional illustration of a flat filter, and fig. 7 (b) is a one-dimensional illustration of a stereo filter;

FIG. 8 is a two-dimensional illustration of a flat filter/stereo filter, wherein (a) represents the flat convolution kernel of convolution layer 1, (b) represents the flat convolution kernel of convolution layer 2, and (c) represents the stereo convolution kernel of convolution layer 2;

FIG. 9 is a model of a feature map generated from an image to be identified through a 2-layer flat convolution and a stereo convolution, where (a) represents the image to be identified (with illumination changes and occlusions), (b) represents the model of 64 feature maps generated through a 2-layer flat convolution, and (c) represents the model of 64 feature maps generated through a 2-layer stereo convolution;

fig. 10 is 16 pattern diagrams generated by the high-dimensional PCANet method, wherein the 8 pattern diagrams of the first row are from the characteristic diagrams generated by the flat convolution, and the 8 pattern diagrams of the second row are from the characteristic diagrams generated by the stereo convolution.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1 to 10, a robust image recognition method based on High-dimensional PCANet (HPCANet), the method comprising the steps of:

Z＝{Z ₁ ,…,Z _K And the image to be identified is a set, namely a test set. Here, a->

Respectively represent the C on the real number domain ₀ An image with length and width of m x n of E {1,3} channels; FIG. 3 illustrates three sample subsets of training set samples and images to be identified from an AR face database;

step 2, initializing parameters and input data: order the

Here, a->

For indicating the stage at which the network is located,

indicating that the network is in training phase->

Wherein (1)>

N＝J；

Step 3 consists of

Construction of matrix->

Wherein,,

is->

Mean value of->

Representing from->

B e {1,2, …, mn } feature blocks of size k×k extracted from the c-th lane, vec (·) represents the operation of stretching the matrix into column vectors. FIG. 4 (a) specifically depicts the process of Vec (-) stretching the matrix into column vectors, and FIG. 5 details the process of extracting feature blocks from feature maps in a flat convolution;

step 4 if

step 5 calculation

Main direction->

Wherein (1)>

Is covariance matrix->

Step 6 from V ^(l) Acquisition of C _l+1 Three-dimensional filter bank

Fig. 7 (a) and 8 (b) show one-dimensional and two-dimensional representations of a flat filter, respectively;

step 7 the feature atlas X of the (1) th convolution layer is calculated as follows ^(l+1) :7.1 Will) be

Projected to

7.2 Will->

The elements in (a) are reorganized into feature atlas +.>

Wherein (1)>

And is also provided with

Here, a->

Representation->

Rearranging into an m×n matrix;

step 9, initializing parameters and input data: order the

l＝0，/>

Wherein Y is _i ^(l) ＝A _i ，N＝J；

Step 10 consists of

Construction of matrix->

Wherein (1)>

Is->

Is used for the average value of (a),

representing the slave Y _i ^(l) B e {1,2, …, mn } feature blocks of size k x k extracted from the c-th channel; FIG. 6 illustrates in detail the process of extracting feature blocks from a feature map in a stereo convolution;

step 11 if

Then jump to step 14, otherwise, execute the next step;

step 12 calculation

Main direction->

Wherein (1)>

As covariance momentMatrix->

Step 13 consists of

Acquisition of C _l+1 Three-dimensional filter bank

Fig. 7 (b) and 8 (c) show one-dimensional and two-dimensional representations of a stereo filter, respectively;

step 14 the feature atlas Y of the 1+1th convolution layer is calculated as follows ^(l+1) :14.1 Will) be

Projected to

14.2 Will->

Reorganizing elements in a feature atlas

Wherein (1)>

And is also provided with

Here, a->

Representing connecting the matrices in the set in the channel direction;

step 15, let l=l+1, execute the above steps 3 to 14 until l=l;

fig. 9 shows the model values of 64×2=128 feature maps generated by 2-layer flat convolution and stereo convolution of the image to be identified (fig. 9 (a));

step 17, performing pattern diagram coding on the feature atlas F to obtain a pattern atlas P: p= { P _i,β } _{i＝1,...,N；β＝1,…,B} Wherein, the method comprises the steps of, wherein,

FIG. 10 illustrates a pattern diagram generated by the high-dimensional PCANet method;

step 18 extracts histogram features H from the pattern atlas P: h= [ H ] _i ] _i＝1,…,N Wherein H is _i ＝[H _i,1 ,…,H _i,B ] ^T ，H _i,β ＝Qhist(P _i,β )，Qhist(P _i,β ) Representing the pattern diagram P _i,β Divided into Q blocks, a histogram is extracted from each block, each histogram using 2 ^T The number of packets, i.e. the code value of the statistical pattern diagram is 2 for each feature block ^T The frequency of occurrence in the individual packets;

step 19 if

Make H ^Te =h, jump to step 21; otherwise, let H ^Tr =h, perform the next step;

step 20 order

l＝0，/>

Wherein n=k,>

executing the steps 3 to 19;

here the number of the elements is the number,

wherein D represents

And->

Length of->

Representation->

The d element of (a)>

Representation of

The d element of (a);

Table 1 compares the recognition rates of three versions of HPCANet (HPCANet-1, HPCANet-2, HPCANet-3) with the existing method (VGG-Face, LCNN, PCANet) for the training set and test set given in FIG. 3. Here, two-layer convolution is adopted by all three versions of HPCANet, the number of flat convolution kernels adopted is 8 (convolution layer 1) +8 (convolution layer 2), the number of three-dimensional convolution kernels adopted by HPCANet-1 is 8 and 24 respectively, the number of three-dimensional convolution kernels adopted by HPCANet-2 is 8 and 32 respectively, and the number of three-dimensional convolution kernels adopted by HPCANet-3 is 8 and 40 respectively.

TABLE 1

As can be seen from Table 1, HPCANet-1 through HPCANet-3 all exhibit better performance than PCANet, especially when the resolution of the image to be identified is low, this advantage is more pronounced; in addition, it can be seen that from HPCANet-1 to HPCANet-3, the recognition performance of HPCANet gradually increases with the increase of feature dimensions.

Claims

1. A robust image recognition method based on high-dimensional PCANet, the method comprising the steps of:

Respectively represent the C on the real number domain ₀ ∈{An image of 1,3 channels having a length-width of mxn;

step 2, initializing parameters and input data: order the

Here, a->

For indicating the stage in which the network is located, +.>

Indicating that the network is in training phase->

Wherein (1)>

Step 3 consists of

Construction of matrix->

Wherein,,

is->

Mean value of->

Representing from->

step 4 if

step 5 calculation

Main direction->

Wherein (1)>

Is covariance matrix->

Step 6 from V ^(l) Acquisition of C _l+1 Three-dimensional filter bank

C _l+1 ≤k ² ；

Step 7 computing the feature atlas X of the 1+1th convolution layer ^(l+1) ；

step 9, initializing parameters and input data: order the

l＝0，/>

Wherein Y is _i ^(l) ＝A _i ，N＝J；

Step 10 consists of

Construction of matrix->

Wherein (1)>

Is->

Is used for the average value of (a),

step 11 if

Jump to step 14, otherwise, execute the next step；

Step 12 calculation

Main direction->

Wherein (1)>

Is covariance matrix->

The i "th eigenvector of (a), the corresponding eigenvalue is lambda _i ", and->

Step 13 consists of

Acquisition of C _l+1 Three-dimensional filter bank>

C _l+1 ≤k ² C _l ；

Step 14 calculates a feature atlas Y for the 1+1th convolution layer ^(l+1) ；

Step 15, let l=l+1, execute the above steps 10 to 14 until l=l;

beta e 1 representing the ith sample…, B pattern diagrams, F _i,· Representing feature map subset F _i In>T represents the number of channels involved in the encoding of a single pattern, USF (·) represents a unit step function, and the input value is binarized by comparison with 0, i.e.:

step 19 if

Make H ^Te =h, jump to step 21; otherwise, let H ^Tr =h, perform the next step;

step 20 order

l＝0，/>

Wherein n=k,>

executing the steps 3 to 19;

here the number of the elements is the number,

wherein D represents

And->

Length of->

Representation->

The d element of (a)>

Representation->

The d element of (a);

2. The robust image recognition method based on high-dimensional PCANet as recited in claim 1, wherein in said step 7, a feature atlas X of the 1 st+1 th convolution layer is calculated as follows ^(l+1) :7.1 Will) be

Projected to W ^(l+1 )：

7.2 Will->

Reorganizing elements in a feature atlas X ^(l+1) ：

Wherein (1)>

And is also provided with

c＝j％C _l+1 The method comprises the steps of carrying out a first treatment on the surface of the Here, a->

Representation->

Rearranged into an mxn matrix.

3. The robust image recognition method based on high-dimensional PCANet according to claim 1 or 2, wherein in the step 14, the feature atlas Y of the 1+1th convolution layer is calculated as follows ^(l+1) :14.1 Will) be

Projected to W ^(l+1) ：

14.2 Will->

Reorganizing elements in a feature atlas Y ^(l+1) ：Y ^(l ⁺¹⁾ ＝{Y _i ^(l+1) } _i＝1,…,N Wherein->

And is also provided with

Here, a->

The representation connects the matrices in the set in the channel direction.