CN108416389B

CN108416389B - Image classification method based on noise reduction sparse automatic encoder and density space sampling

Info

Publication number: CN108416389B
Application number: CN201810212714.7A
Authority: CN
Inventors: 张辉
Original assignee: Yancheng Teachers University
Current assignee: Yancheng Teachers University
Priority date: 2018-03-15
Filing date: 2018-03-15
Publication date: 2020-10-30
Anticipated expiration: 2038-03-15
Also published as: CN108416389A

Abstract

The invention discloses an image classification method based on a noise reduction sparse automatic encoder and density space sampling. The method comprises the following steps: constructing an image block training set; constructing a noise reduction sparse automatic encoder of a single hidden layer, inputting an image block training set, and training the noise reduction sparse automatic encoder; performing density space sampling on each image in the training image dataset and the test image dataset; extracting local characteristic set information from a space region obtained by density space sampling of each image by using a noise reduction sparse automatic encoder; encoding the feature set information by using two layers of stacked Fisher vectors to obtain a final Fisher Vector of each image; and training a classifier by using the Fisher vector to realize image classification. The invention can accurately acquire the image information, improves the classification accuracy of the image, and can be used for constructing a large-scale image classification and retrieval system.

Description

Image classification method based on noise reduction sparse automatic encoder and density space sampling

Technical Field

The invention belongs to the technical field of image classification, and particularly relates to an image classification method based on a noise reduction sparse automatic encoder and density space sampling.

Background

With the development of multimedia technology, image classification has become a hot problem for research in the field of computer vision. The image classification is to classify the images into different preset categories according to certain attributes of the images. How to effectively express the images is the key for improving the accuracy of image classification, wherein the problems of feature selection and feature extraction are key and difficult problems existing in the current image classification. Although the traditional methods for artificially designing features, such as Gabor filter, SIFT, LBP, HOG, and the like, have a certain effect in image classification, the methods need to be carefully designed and cannot be well applied to specific problems. In recent years, deep learning has been highly successful in a variety of application fields such as image recognition and speech recognition. A Convolutional Neural Network (CNN) in a deep learning model makes breakthrough progress in feature extraction, but the CNN needs a large amount of manually labeled label data, and a noise reduction sparse automatic encoder can learn the features describing image contents by using a large amount of label-free data. The traditional spatial pyramid is widely applied to image classification, but the grid proportion of the traditional spatial pyramid classification is fixed, the classification method is single, and the density spatial sampling method based on the sliding window can capture more image spatial information.

Disclosure of Invention

The invention provides an image classification method based on a noise reduction sparse automatic encoder and density space sampling, aims to solve the problems of image feature extraction and encoding, overcomes the defects of the existing image classification method, reduces the calculation cost and improves the classification precision.

In order to achieve the technical purpose, the technical scheme of the invention is as follows:

an image classification method based on a noise reduction sparse automatic encoder and density space sampling comprises the following steps:

(1) constructing an image block training set;

(2) constructing a noise reduction sparse automatic encoder of a single hidden layer, inputting the image block training set obtained in the step (1), and training the noise reduction sparse automatic encoder until an iteration stopping condition is met;

(3) for training image data set M₁And a test image dataset M₂Performing density space sampling on each image in the image data set, and setting a training image data set M₁Number of images of m₁Testing the image dataset M₂Number of images of m₂；

(4) Taking a trained noise reduction sparse automatic encoder as a feature extractor, and extracting local feature set information from a space region obtained by density space sampling of each image;

(5) encoding the feature set information extracted in the step (4) by adopting two layers of stacked Fisher vectors to obtain a final Fisher Vector which is used as a feature Vector of image classification;

(6) using a training image dataset M₁Training a classifier by using the Fisher vector and the label corresponding to the Fisher vector to test the image data set M₂And inputting the Fisher vector into a trained classifier to classify the image.

Further, the specific process of step (1) is as follows:

(1a) obtaining unlabeled image sets

Wherein the content of the first and second substances,

representing the ith image, wherein N is the number of unlabeled images;

(1b) from each image

Randomly extracting M image blocks of r x r size to obtain MN image blocks in total to form an image block training set

n＝r*r*3。

Further, the specific process of step (2) is as follows:

(2a) carrying out noise adding treatment on the image block training set X: randomly pairing image block data X in X according to a ratio q_jAdding noise to obtain x_j'～q(x_j'|x_j) Wherein x is_j' is to scale q the vector x_jSetting a part of values as 0, wherein j belongs to {1, MN };

(2b) random initialization model parameter W₁、W₂、b₁、b₂Setting the number of neurons in an input layer of the noise reduction sparse automatic encoder as n and the number of neurons in a hidden layer as s, wherein,

the weight of the code is represented by,

the weight of the decoding is represented by the weight,

a term representing the coding bias is represented,

representing a decoding bias term;

let alpha_j＝g(W₁x_j'+b₁) Wherein, in the step (A),

(2c) defining an overall cost function Loss of the noise reduction sparse automatic encoder:

where λ is the weight attenuation coefficient, β is the sparse penalty weight, ρ is the target sparse value,

is the average response of all training samples on the t-th hidden neuron;

(2d) calculating the gradient of the overall cost function Loss by using an error back propagation algorithm; solving the minimization problem of the overall cost function Loss through an improved quasi-Newton method L-BFGS (Levenberg-Beckmann-Straussler) so as to obtain a trained model parameter W₁、b₁、W₂、b₂。

Further, the specific process of step (3) is as follows:

(3a) separately acquiring training image data sets M₁And a test image dataset M₂Each image of (a);

(3b) performing density space sampling on each image acquired in the step (3a) from left to right and from top to bottom in turn in an iterative manner by using a sliding window with variable dimensions, wherein each image sampling obtains R spatial regions; defining the size of an initial sliding window as [ w, h ], the step length of each sliding of the sliding window as t, and defining the space region acquired by each sliding of the sliding window as a quadruple Area (m, n, w, h); where m and n represent the coordinate positions of the top left corner of the sliding window in the image, and w and h represent the width and height of the initial sliding window.

Further, the specific process of step (4) is as follows:

(4a) randomly extracting from each spatial region of each image obtained in step (3b)

Graph of rA block;

(4b) taking the trained noise reduction sparse automatic encoder as a feature extractor to obtain the I-th space region in the image i

Local feature set of r image blocks

Each image acquires R local feature sets, corresponding to R spatial regions.

Further, the specific steps of step (5) are as follows:

(5a) training image data set M by adopting Gaussian mixture model containing K Gaussian components₁Modeling all local feature sets, and solving to obtain a parameter theta of the model;

(5b) on the basis of the step (5a), coding a local feature set of the ith spatial region of the image i by using a Fisher vector of the first layer to obtain a Fisher vector b_l(ii) a Obtaining a Fisher vector set of the image i after Fisher vector coding of the first layer

(5c) For training image data set M₁And a test image dataset M₂Performing PCA (principal component analysis) dimensionality reduction on Fisher vector sets of all the images, wherein the Fisher vector set of the image i after dimensionality reduction is

(5d) Training image data set M by adopting Gaussian mixture model containing K' Gaussian components₁Modeling the Fisher vector set after dimension reduction of all the images, and solving to obtain a parameter theta' of the model;

(5e) on the basis of the step (5d), a Fisher vector set B of the image i after dimensionality reduction is carried out by utilizing a Fisher vector of the second layer_i' encoding to obtain Fisher vector f_iObtaining a training image data set M₁And a test image dataset M₂Each figure isLike the final Fisher vector.

Further, in step (5a), a Gaussian mixture model of K Gaussian units is recorded as

Wherein, a_tAs a single local feature, p_i(a_t) Represents the (i) th gaussian unit of the image,

d represents a local feature a_tDimension of (d), p_i,μ_i,∑_iRespectively representing the weight, the mean value and the covariance matrix of the ith Gaussian unit; by training the image dataset M₁To estimate all parameters theta ═ pi of the gaussian mixture model_i,μ_i,∑_i,i＝1,...K}。

Further, in step (6), the classifier employs a support vector machine.

Adopt the beneficial effect that above-mentioned technical scheme brought:

(1) the invention adopts the noise reduction sparse automatic encoder as the feature extractor, the model is an unsupervised feature learning method, and the invention can use the label-free samples for training and can solve the problem that a large amount of manually marked samples are needed at present. In addition, the noise-reduced sparse auto-encoder is able to learn a more robust feature representation than a normal sparse auto-encoder.

(2) The invention adopts a density space sampling method to replace the traditional space pyramid structure. The traditional space pyramid has fixed image segmentation, and the density space sampling method can more flexibly segment the image, so that more image space information can be captured.

(3) The invention adopts the stacked Fisher vector to carry out feature coding, compared with a standard single-layer structure, the stacked structure can extract the semantic information of the image in a hierarchical structure, and can obtain better precision and performance in image classification.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

The technical scheme of the invention is explained in detail in the following with the accompanying drawings.

The software and hardware environment of the test experiment of this example is as follows:

hardware type:

computer type: a desktop machine;

CPU：Intel(R)Core(TM)i5-5200U CPU@2.20GHz

memory: 8.00GB

The system type is as follows: 64-bit operating system

And (3) developing a language: matlab

The technical scheme of the invention is explained in detail in the following with the accompanying drawings. The example is exemplified by an STL-10 database containing 10 types of RGB images, each 96 x 96 in size. The number of training samples for supervised training is 5000, 5000 training samples are divided into ten folds, the number of training samples for supervised training is 1000 each time, and the number of test samples is 8000.

The invention provides an image classification method based on a noise reduction sparse automatic encoder and density space sampling, which is matched with the image classification method shown in figure 1 and comprises the following specific steps.

Step 1, constructing an image block training set:

1a, obtaining an unlabeled image set in STL-10

Wherein the content of the first and second substances,

representing the ith image, the number of unlabeled images is 100000;

1b from each image

Randomly extracting 10 image blocks with 8-by-8 sizes to form an image block training set

Multiplication by 3 is because there are R, G, B three channels.

Step 2, constructing a single hidden layer noise reduction sparse automatic encoder, inputting the image block training set obtained in the step 1b, and training the noise reduction sparse automatic encoder until the iteration stopping condition is met:

2a, adding noise to the image block training set X: randomly pairing image block vector data X in X according to a certain proportion q_jAdding noise to obtain x_j'～q(x_j'|x_j). Wherein x is_j' is to scale q the vector x_jSome of the values are set to 0. Where q is set to 0.5, j ∈ {1,1000000 };

2b, randomly initializing model parameters W₁、W₂、b₁、b₂The number of neurons in the input layer of the noise reduction sparse automatic encoder is 192, and the number of neurons in the hidden layer is 500. Wherein the content of the first and second substances,

in order to encode the weights, the weights are,

in order to decode the weights, the weights are decoded,

in order to encode the bias term(s),

to decode the bias term. Suppose alpha_j＝g(W₁x_j'+b₁) Wherein

2c, overall cost function Loss defining DSAE:

is the average response of all training samples on the tth hidden neuron, preferably λ 0.03, β 3, ρ 0.1;

and 2d, calculating the gradient of the overall cost function Loss by using an error back propagation algorithm. Solving the minimization problem of the cost function Loss by an improved quasi-Newton method L-BFGS (bidirectional Forwarding-binary scaling) method so as to obtain a trained model parameter W₁、b₁、W₂、b₂。

Step 3, training image data set M₁And a test image dataset M₂Wherein the training image dataset M is subjected to density space sampling₁Number of images 1000, test image dataset M₂Number of images 8000:

3a, respectively acquiring training image data sets M₁And a test image dataset M₂Each image of (a);

3b, density spatial sampling is performed on each image in step 3a in an iterative manner from left to right and from top to bottom in turn using a sliding window of variable dimensions. Defining the size of an initial sliding window as [ w, h ], the step length of each sliding of the sliding window is 10, and the space region acquired by each sliding of the sliding window is defined as a quadruple Area (m, n, w, h). Where m and n represent the coordinate positions of the top left corner of the sliding window in the image, and w and h represent the width and height of the initial sliding window. m and n are initially set to 0, w and h are initially set to 46, and each image can be sampled to obtain 90 spatial regions.

Step 4, taking the trained noise reduction sparse automatic encoder as a feature extractor, and extracting local feature set information from a space region obtained by density space sampling of each image:

4a, randomly extracting 100 8 × 8 image blocks from each spatial region of each image obtained in the step 3 c;

4b, taking the trained noise reduction sparse automatic encoder as a feature extractor to obtain local feature sets of 100 8 x 8 image blocks in the ith space region in the image i

Each image may capture 90 feature sets.

And 5, encoding the feature set information in the step 4b by using two layers of stacked Fisher vectors to obtain a final Fisher Vector as a feature Vector of image classification:

5a, training image data set M by adopting a Gaussian mixture model containing 128 Gaussian components₁Modeling all local feature sets, and solving to obtain a parameter theta of the model; a Gaussian mixture model of 128 Gaussian units is recorded as

d is a local feature a_tD192, pi_i,μ_i,∑_iRespectively representing the weight, mean and covariance matrix of the ith gaussian unit. By training the image dataset M₁To estimate all parameters theta ═ pi of the gaussian mixture model_i,μ_i,∑_i,i＝1,...128}；

5b, on the basis of the step 5a, coding a local feature set A of the ith spatial region of the image i by using a Fisher vector of the first layer to obtain a Fisher vector b_l. After Fisher vector encoding of the first layer, a Fisher vector set of the image i can be obtained

5c, for training image data set M₁And a test image dataset M₂Performing PCA (principal component analysis) dimensionality reduction on Fisher vector sets of all the images, wherein the Fisher vector set of the image i after dimensionality reduction is

5d, training the image data set M by adopting a Gaussian mixture model containing 32 Gaussian components₁Modeling the Fisher vector set after dimension reduction of all the images, and solving to obtain a parameter theta' of the model;

5e, on the basis of the step 5d, utilizing the Fisher vector of the second layer to perform Fisher vector set B on the dimensionality reduced image i_i' encoding to obtain Fisher vector f_iFrom which a training image data set M can be derived₁And a test image dataset M₂Final Fisher vector f for each image.

Step 6, adopting the training image data set M in the step 5e₁Training a support vector machine by using the Fisher vectors and the labels corresponding to the Fisher vectors to test the image data set M₂The Fisher vector is input into a trained classifier to classify the image.

The embodiments are only for illustrating the technical idea of the present invention, and the technical idea of the present invention is not limited thereto, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the scope of the present invention.

Claims

1. The image classification method based on the noise reduction sparse automatic encoder and the density space sampling is characterized by comprising the following steps of:

(1) constructing an image block training set; the specific process of the step is as follows:

(1a) obtaining unlabeled image sets

Wherein the content of the first and second substances,

representing the ith image, wherein N is the number of unlabeled images;

(1b) from each image

Randomly extracting M image blocks of r x r size to obtain MN image blocksForming a training set of image blocks

(3) for training image data set M₁And a test image dataset M₂Performing density space sampling on each image in the image data set, and setting a training image data set M₁Number of images of m₁Testing the image dataset M₂Number of images of m₂(ii) a The specific process of the step is as follows:

(3b) performing density space sampling on each image acquired in the step (3a) from left to right and from top to bottom in turn in an iterative manner by using a sliding window with variable dimensions, wherein each image sampling obtains R spatial regions; defining the size of an initial sliding window as [ w, h ], the step length of each sliding of the sliding window as t, and defining the space region acquired by each sliding of the sliding window as a quadruple Area (m, n, w, h); wherein m and n represent the coordinate position of the upper left corner of the sliding window in the image, and w and h represent the width and height of the initial sliding window;

(4) taking a trained noise reduction sparse automatic encoder as a feature extractor, and extracting local feature set information from a space region obtained by density space sampling of each image; the specific process of the step is as follows:

Image blocks of r;

Local feature set of r image blocks

Obtaining R local feature sets of each image, wherein the R local feature sets correspond to R spatial regions;

(5) encoding the feature set information extracted in the step (4) by adopting two layers of stacked Fisher vectors to obtain a final Fisher Vector which is used as a feature Vector of image classification; the specific process of the step is as follows:

(5e) on the basis of the step (5d), a Fisher vector set B of the image i after dimensionality reduction is carried out by utilizing a Fisher vector of the second layer_i' encoding to obtain Fisher vector f_iObtaining a training image data set M₁And a test image dataset M₂Final Fisher vector of each image;

2. The image classification method based on the noise reduction sparse automatic encoder and the density space sampling according to claim 1, wherein the specific process of the step (2) is as follows:

the weight of the code is represented by,

the weight of the decoding is represented by the weight,

a term representing the coding bias is represented,

representing a decoding bias term;

let alpha_j＝g(W₁x_j'+b₁) Wherein, in the step (A),

g(z)＝1/(1+exp(-z))，

is the average response of all training samples on the t-th hidden neuron;

3. The method for classifying an image based on a noise reduction sparse automatic encoder and density spatial sampling as claimed in claim 1, wherein in step (5a), a Gaussian mixture model of K Gaussian units is recorded as

d represents a local feature a_tDimension of (d), p_i,μ_i,∑_iRespectively representing the weight, the mean value and the covariance matrix of the ith Gaussian unit; by training the image dataset M₁To estimate all parameters of the gaussian mixture modelθ＝{π_i,μ_i,∑_i,i＝1,...K}。

4. The method for classifying an image based on a noise reduction sparse automatic encoder and density spatial sampling according to claim 1, wherein in the step (6), the classifier adopts a support vector machine.