CN103984959A

CN103984959A - Data-driven and task-driven image classification method

Info

Publication number: CN103984959A
Application number: CN201410224860.3A
Authority: CN
Inventors: 黄凯奇; 任伟强; 张俊格
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2014-05-26
Filing date: 2014-05-26
Publication date: 2014-08-13
Anticipated expiration: 2034-05-26
Also published as: CN103984959B

Abstract

The invention discloses a data-driven and task-driven image classification method. The data-driven and task-driven classification method comprises the steps that a convolutional neural network structure is designed according to the scale of data sets and image content; a convolutional neural network model is trained through the given classified data sets; feature expression is extracted from training set images through a trained convolution neural network; images to be tested are input into the trained convolutional neural network and are classified. The data-driven and task-driven image classification method is based on nonlinear convolution feature learning, and the model can be adapted to the data sets through a date driving mode, so that the specific data set can be better described; errors of K-nearest neighbors can be directly optimized through a task-driving mode, and therefore a better performance can be obtained with respect to a K-nearest neighbor task; efficient training can be conducted through a GPU in the training stage, and efficient K-nearest neighbor image classification can be achieved just through a CPU in the testing stage; in this way, the data-driven and task-driven image classification method is quite suitable for a large-scale image classification task, a retrieval task and the like.

Description

A kind of image classification method based on data and task-driven

Technical field

The present invention relates to Image Classfication Technology field in computer vision, particularly a kind of image classification method based on data and task-driven.

Background technology

Images Classification is the most basic one of the studying a question of computer vision, and its problem that will solve is exactly disconnected certain type objects that wherein whether comprises of a given image automatic judging.Images Classification problem is core topic of vision research, and many other vision research all will rely on and relate to Images Classification problem, and as objects in images detects, follows the tracks of, image is cut apart, object classification in video, detection, tracking, behavioural analysis, gesture identification etc.

K nearest neighbor Images Classification is a kind of image classification method, refers to that what when to Images Classification, adopt is the mode of k nearest neighbor ballot, and in K nearest image, the maximum classification of occurrence number is predicted as the classification of this test sample book.Except can realizing simply, efficiently image is classified, k nearest neighbor classification also has a lot of other characteristics.Such as k nearest neighbor Images Classification can obtain and the immediate sample of test pattern, can be applied in the fields such as image retrieval, face retrieval, video frequency searching.

Because selection and the image feature representation of sorter in conventional art is two independently processes, and k nearest neighbor classification is a nonparametric model, its prediction depends critically upon the space distribution of data, also be image feature representation, this is not optimum with regard to causing image feature representation with regard to k nearest neighbor classification, and classification performance is impacted.

In recent years, the development of Images Classification field rapidly, has obtained a lot of important breakthrough aspect sorting technique.Current, word bag model is one of image feature representation main flow framework.Word bag model is described and is carried out statistical nature description by the low-level image feature to the image block of intensive extraction, obtains the global feature of image is expressed.Word bag model conventionally by low-level image feature describe, the step such as vision word generates, low-level image feature coding, feature converge, sorter training and test forms, before sorter training, what we can think the employing of word bag model is that unsupervised mode is expressed image, no matter be that the traditional low-level image features such as SIFT, HOG or word bag model middle level features expressed, all do not use the label information of image, thereby such feature representation classifies for k nearest neighbor printenv model such, not optimum conventionally.

Summary of the invention

In view of this, fundamental purpose of the present invention is to provide a kind of image classification method based on data and task-driven, to realize on large scale image data set Images Classification more fast and accurately.

In order to achieve the above object, the present invention is by the following technical solutions:

An image classification method based on data and task-driven, comprising:

Data set is prepared, according to data set scale and picture material design convolutional neural networks structure;

Model training, is used the training of given grouped data set pair convolutional neural networks model;

Use the convolutional neural networks after training to carry out feature representation extraction to training set image;

Convolutional neural networks by after test pattern input training, is used k nearest neighbor mode to classify to test pattern.

Further, described data set is prepared, and according to data set scale and picture material design convolutional neural networks structure, also comprises:

One or more at least are in the following manner realized data and are strengthened: the marginal portion of 1) going image surrounding from original image random cropping with produce make new advances have nuance sample image; 2) in original image pixels, add random Gaussian to produce the sample image making new advances.

Image pattern is zoomed to fixed measure, and the input using vector of the stretching one-tenth of pixel as convolutional neural networks.

Further, described model training, is used the training of given grouped data set pair convolutional neural networks model, specifically also comprises:

Use convolutional neural networks as essential characteristic transformation model;

Based on adjacent component analysis expectation error rate as loss function to the training of convolutional neural networks model;

Optimization method based on gradient carries out network training, and uses GPU to carry out computing.

Further, the convolutional neural networks after described use training extracts feature representation to training set image, comprising:

Convolutional neural networks by after all training image input training, takes out the response of the full articulamentum of last one deck as the feature representation of every training image.

Further, the feature representation of training set image is configured to KD-tree pre-stored.

Further, described by the convolutional neural networks after test pattern input training, use k nearest neighbor mode is classified to test pattern, comprising:

For given test pattern, this image scaling is big or small to convolutional neural networks mode input, then send into convolutional neural networks and carry out forward calculation, take out the response of the full articulamentum of last one deck as the feature representation of this test pattern, use this expression in the feature representation of training set image, to carry out k nearest neighbor retrieval, the maximum classification of occurrence number in K nearest training image of feature representation is predicted as to the classification of this test pattern.

Further, described based on adjacent component analysis expectation error rate as loss function to the training of convolutional neural networks model, specifically comprise:

Adopt adjacent component analysis NCA to estimate k nearest neighbor error in classification, given N is to training sample { (x _i, y _i) | i=1 ..., N}, wherein x _iimage pattern, y _iits corresponding label, for a sample x _i, another sample x _jwith x _ibelonging to other definition of probability of same class is

p_{ij} = \frac{e^{{- | | F (x_{i}) - F (x_{j}) | |}^{2}}}{Σ_{k &NotEqual; i} e^{- {| | F (x_{i}) - F (x_{k}) | |}^{2}}},

Wherein F () is by the eigentransformation function of described convolutional neural networks;

Concerning adjacent component analysis, sample x _ithe probability that belongs to classification c and correctly classified is

p_{i} = \frac{1}{N} \underset{y_{j} = c}{Σ} p_{ij},

Expectation error rate is

\begin{matrix} e_{NCA} = 1 - \frac{1}{N} Σ_{i = 1}^{N} p_{i} \\ = 1 - \frac{1}{N} Σ_{i = 1}^{N} \underset{y_{j} = c}{Σ} p_{ij} \\ = 1 - \frac{1}{N} Σ_{i, j = 1}^{N} p_{ij} y_{ij} \end{matrix},

Wherein, y _ijrepresent sample x _iwith sample x _jwhether belong to same classification, y _i=y _jtime, y _ij=1, otherwise y _ij=0.Expectation error rate in formula (3) is that of k nearest neighbor classification error rate is approximate, the loss function by this expectation error rate as the network optimization.

Further, the described optimization method based on gradient carries out network training and specifically comprises one of following mode: random Gradient Descent, method of conjugate gradient, quasi-Newton method, L-BFGS.

The above-mentioned image classification method based on data and task-driven provided by the invention, compared with prior art has the following advantages:

1), adopt based on linear Convolution feature learning, can be with the mode implementation model of data-driven the self-adaptation to data set, thereby specific data set is better described.

2), by directly the error of k nearest neighbor being optimized, in the mode of task-driven, convolutional neural networks is optimized, can in k nearest neighbor task, obtain better performance.

3), in the training stage, can adopt GPU to carry out efficient training, at test phase, only need use CPU just can realize efficient k nearest neighbor Images Classification, be highly suitable for the tasks such as large-scale Images Classification, retrieval.

Accompanying drawing explanation

Fig. 1 is image classification method model training and the test flow chart based on data and task-driven according to the embodiment of the present invention;

Fig. 2 adopts adjacent component analysis on handwritten numeral database MNIST, to train the sub-schematic diagram of convolutional neural networks ground floor convolution obtaining according to the embodiment of the present invention;

Fig. 3 is the sub-schematic diagram of convolutional neural networks ground floor convolution that adopts adjacent component analysis to train on CIFAR-10 database to obtain according to the embodiment of the present invention;

Fig. 4 be the method dimensionality reduction that the present invention of MNIST use of numerals proposed according to the embodiment of the present invention to the result of 2 dimensions and with the contrast of additive method, the data point of different colours represents different numerals.

Embodiment

For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in more detail.

Thought main points of the present invention are: 1) based on linear Convolution feature learning, can be with the mode implementation model of data-driven the self-adaptation to data set, thereby specific data set is better described; 2) the present invention is by directly the error of k nearest neighbor being optimized, and in the mode of task-driven, convolutional neural networks is optimized, and can in k nearest neighbor task, obtain better performance; 3) in the training stage, can adopt GPU to carry out efficient training, at test phase, only need use CPU just can realize efficient k nearest neighbor Images Classification, be highly suitable for the tasks such as large-scale Images Classification, retrieval.

As shown in Figure 1, Fig. 1 the first half is the image classification method model training process flow diagram based on data and task-driven according to the embodiment of the present invention.The structure that has shown convolutional neural networks in figure, before the sub-size of network layer convolution be 5 * 5, along with reducing of trellis diagram size, the sub-size of convolution adopting is below 3 * 3, has finally connected and has comprised 128 and 64 neuronic full articulamentums.By training set sample order is sent into convolutional neural networks, and use adjacent component analysis loss function to carry out from back to front error propagation, calculate the gradient of every layer network parameter, and use stochastic gradient descent algorithm to upgrade network with this gradient, realize the network model study of data-driven and task-driven.Fig. 1 the latter half has been set forth the test process of the method.The convolutional neural networks that different classes of original image input trains, the response that last full articulamentum of network is obtained is as the feature representation of image, formed a new non-linear space, image different classes of in this space can make a distinction better.

Method of the present invention comprises the following steps:

S1, data set are prepared, according to data set scale and picture material design convolutional neural networks structure.

Convolutional neural networks comprises a large amount of model parameters, and parameter means that model is more complicated more, easier training in there is over-fitting situation, algorithm performance on training set is fine, on test set, performance is very poor.Balance training data scale and model complexity are to prevent over-fitting, obtain the important channel of optimum performance.On the one hand, data volume is larger, and model training is more not easy over-fitting, and performance is better.But data volume is normally limited, this just need to adopt, and certain way is artificial produces new data, and we realize in the following manner data and strengthen: the marginal portion of 1) going image surrounding from original image random cropping with produce make new advances have nuance sample image; 2) in original image pixels, add random Gaussian to produce the sample image making new advances.All image patterns are all scaled to fixed measure, and vector of the stretching one-tenth of all pixels is as the input of convolutional neural networks.On the other hand, the in the situation that of given training set scale, need the model complexity of corresponding control convolutional neural networks.The model complexity of convolutional neural networks is conventionally directly related with the structure of model, and the number of plies of network is more, and the nodes of every layer is more, and can training parameter just more, model be just more complicated.

S2, based on adjacent component analysis expectation error rate, as loss function, convolutional neural networks is trained, optimization method adopts random Gradient Descent, and uses GPU to carry out computing.

Convolutional neural networks at Images Classification, the field such as detect, cut apart and be widely used.In these application, convolutional neural networks is normally trained based on general object classification criterion, as logistic recurrence, Softmax recurrence, cross entropy etc.The network that adopts this general standard training to obtain, can directly predict and obtain result, but will be used for directly processing k nearest neighbor classification problem, is not optimum conventionally.Above-mentioned general classification is end to end, from image, directly predicted, and k nearest neighbor problem normally has a feature representation to every image, use this feature representation to carry out k nearest neighbor retrieval, by the classification of K nearest sample, determine the classification of test sample book.If we use the network of general sorting criterion training to carry out feature representation (can get last one deck response of network as feature) to image, be difficult to guarantee that this feature representation is applicable to nearest neighbour classification sight.For learning the better feature representation for k nearest neighbor problem, we come directly k nearest neighbor error to be optimized by new training criterion, thereby guarantee that the feature representation of study is optimum under k nearest neighbor problem.

Directly using k nearest neighbor error is irrational as objective function, because need objective function can lead continuously, thereby can adopt random gradient descending method to upgrade network.We adopt adjacent component analysis (Neighborhood Component Analysis, NCA) to carry out approximate description to k nearest neighbor error in classification.Given N is to training sample { (x _i, y _i) | i=1 ..., N}, wherein x _iimage pattern, y _iit is its corresponding label.For a sample x _i, another sample x _jwith x _ibelonging to other definition of probability of same class is

p_{ij} = \frac{e^{{- | | F (x_{i}) - F (x_{j}) | |}^{2}}}{Σ_{k &NotEqual; i} e^{- {| | F (x_{i}) - F (x_{k}) | |}^{2}}} - - - (1)

Wherein F () is the non-linear transform function of a high complexity, and input picture is for conversion into a proper vector, and we represent F () with convolutional neural networks here, and last one deck response of taking out neural network is as feature representation.From formula 1, can find out sample x _jwith x _ifall into the Euclidean distance between the feature representation that other probability of same class is inversely proportional to both.

In k nearest neighbor classification, the prediction classification of a test sample book is to occur maximum classifications by adding up in its nearest K sample.Concerning adjacent component analysis, sample x _ithe probability that belongs to classification c and correctly classified is

p_{i} = \frac{1}{N} \underset{y_{j} = c}{Σ} p_{ij} - - - (2)

So expectation error rate can be defined as

\begin{matrix} e_{NCA} = 1 - \frac{1}{N} Σ_{i = 1}^{N} p_{i} \\ = 1 - \frac{1}{N} Σ_{i = 1}^{N} \underset{y_{j} = c}{Σ} p_{ij} \\ = 1 - \frac{1}{N} Σ_{i, j = 1}^{N} p_{ij} y_{ij} \end{matrix} - - - (3)

Y wherein _ijrepresent sample x _iwith sample x _jwhether belong to same classification, y _i=y _jtime, y _ij=1, otherwise y _ij=0.Expectation error rate in formula 3 is that of k nearest neighbor classification error rate is approximate, in this patent, with this, expects that error rate is as the objective function of the network optimization.The adjacent component loss function of formula 3 definition can be led continuously, thereby can use easily the optimization method based on gradient to carry out network training, as random Gradient Descent, method of conjugate gradient, quasi-Newton method, L-BFGS etc.

S3, the convolutional neural networks that all training image inputs are trained, take out the response of the full articulamentum of last one deck as the feature representation of every training image.After training convolutional neural networks, we are using this network as eigentransformation function F (), for extracting feature from image.K nearest neighbor classification is nonparametric model, and itself does not have parameter, only each image of training set need to be inputted to convolutional neural networks, and take out the full articulamentum of last one deck as the feature representation of this sample.The feature representation of the training set having extracted can be pre-stored, at test phase, for k nearest neighbor, retrieves.

S4, the convolutional neural networks that test pattern input is trained, take out the response of the full articulamentum of last one deck as the feature representation of this image, and use this feature representation to use k nearest neighbor mode to classify in the character pair of training set image is expressed, the maximum classification of occurrence number in the immediate K of a feature representation image is predicted as to the classification of this test pattern.The feature representation that the feature representation of test pattern and training set image extract in advance compares, and generally need to all once compare with all training samples, and this time complexity is proportional to training set size.For very large training set, this obvious cost is very high, for carrying out faster k nearest neighbor, searches, and adopts conventional this data structure of KD-Tree in nearest _neighbor retrieval, by K dimension space is cut apart, accelerates the speed of neighbour's retrieval.

Fig. 2 adopts adjacent component analysis on handwritten numeral database MNIST, to train the sub-schematic diagram of convolutional neural networks ground floor convolution obtaining according to the embodiment of the present invention.Wherein, what left figure showed is the sample in MNIST database, and what right figure showed is convolutional neural networks ground floor convolution after training, can find out, what learn is digital stroke substantially.

Fig. 3 is the sub-schematic diagram of convolutional neural networks ground floor convolution that adopts adjacent component analysis to train on CIFAR-10 database to obtain according to the embodiment of the present invention.From right figure, find out, convolution of learning is some edges and simple bottom visual pattern.

Fig. 4 be the method dimensionality reduction that MNIST use of numerals this patent proposed according to the embodiment of the present invention to the result of 2 dimensions and with the contrast of additive method, the data point of different colours represents different numerals.Can find out, different digital feature representation space distributions has realized preferably and having clustered, and is suitable for next adopting k nearest neighbor mode classification to classify.

In a word, the present invention proposes a kind of new image classification method based on data and task-driven, use deep layer convolutional neural networks as feature representation model, and use stochastic gradient descent algorithm on GPU, to carry out model training.After convolutional neural networks has been trained, for extracting feature representation from image and using k nearest neighbor way of search to carry out Images Classification.Experiment shows that this invention compares with the image classification algorithms of main flow based on k nearest neighbor and have that feature representation has strong identification, model training is subject to data-driven, task-driven, test process is very efficient, is suitable for k nearest neighbor Images Classification and retrieval under large scale data.

Above-described specific embodiment; object of the present invention, technical scheme and beneficial effect are further described; institute is understood that; the foregoing is only specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any modification of making, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims

1. the image classification method based on data and task-driven, is characterized in that, comprising:

2. the image classification method based on data and task-driven according to claim 1, is characterized in that, data set is prepared, and according to data set scale and picture material design convolutional neural networks structure, also comprises:

3. the image classification method based on data and task-driven according to claim 1, is characterized in that, data set is prepared, and according to data set scale and picture material design convolutional neural networks structure, also comprises:

4. the image classification method based on data and task-driven according to claim 1, is characterized in that, model training is used the training of given grouped data set pair convolutional neural networks model, specifically also comprises:

5. the image classification method based on data and task-driven according to claim 1, is characterized in that, uses the convolutional neural networks after training to extract feature representation to training set image, comprising:

6. the image classification method based on data and task-driven according to claim 5, is characterized in that, the feature representation of training set image is configured to KD-tree pre-stored.

7. the image classification method based on data and task-driven according to claim 1, is characterized in that, the convolutional neural networks by after test pattern input training, is used k nearest neighbor mode to classify to test pattern, comprising:

8. the image classification method based on data and task-driven according to claim 4, is characterized in that, based on adjacent component analysis expectation error rate as loss function to the training of convolutional neural networks model, specifically comprise:

Adopt adjacent component analysis NCA to estimate k nearest neighbor error in classification, given N is to training sample { (x _i, y _i) | i=1 ..., N}, wherein, x _iimage pattern, y _iits corresponding label, for a sample x _i, another sample x _jwith x _ibelonging to other definition of probability of same class is

p_{ij} = \frac{e^{{- | | F (x_{i}) - F (x_{j}) | |}^{2}}}{Σ_{k &NotEqual; i} e^{- {| | F (x_{i}) - F (x_{k}) | |}^{2}}},

Wherein, F () is the eigentransformation function of described convolutional neural networks;

p_{i} = \frac{1}{N} \underset{y_{j} = c}{Σ} p_{ij},

Expectation error rate is

\begin{matrix} e_{NCA} = 1 - \frac{1}{N} Σ_{i = 1}^{N} p_{i} \\ = 1 - \frac{1}{N} Σ_{i = 1}^{N} \underset{y_{j} = c}{Σ} p_{ij} \\ = 1 - \frac{1}{N} Σ_{i, j = 1}^{N} p_{ij} y_{ij} \end{matrix},

Wherein, y _ijrepresent sample x _iwith sample x _jwhether belong to same classification, y _i=y _jtime, y _ij=1, otherwise y _ij=0; Expectation error rate in formula (3) is that of k nearest neighbor classification error rate is approximate, the loss function by this expectation error rate as the network optimization.

9. the image classification method based on data and task-driven according to claim 4, it is characterized in that, the optimization method based on gradient carries out network training and specifically comprises one of following mode: random Gradient Descent, method of conjugate gradient, quasi-Newton method, L-BFGS.