CN103984959B

CN103984959B - A kind of image classification method based on data and task-driven

Info

Publication number: CN103984959B
Application number: CN201410224860.3A
Authority: CN
Inventors: 黄凯奇; 任伟强; 张俊格
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2014-05-26
Filing date: 2014-05-26
Publication date: 2017-07-21
Anticipated expiration: 2034-05-26
Also published as: CN103984959A

Abstract

The invention discloses a kind of image classification method based on data and task-driven, this method includes：Convolutional neural networks structure is designed according to data set scale and picture material；It is trained using given grouped data set pair convolutional neural networks model；Using the convolutional neural networks after training to training set image zooming-out feature representation；The convolutional neural networks that test image is inputted after training, and classified.The inventive method is based on linear Convolution feature learning, can in the way of data-driven implementation model to the adaptive of data set, so as to preferably describe specific data set, error directly to k nearest neighbor by way of task-driven is optimized, and can obtain better performance in k nearest neighbor task；And efficient training can be carried out using GPU in the training stage, need to only use CPU just to realize efficient k nearest neighbor image classification in test phase, be highly suitable for the tasks such as large-scale image classification, retrieval.

Description

A kind of image classification method based on data and task-driven

Technical field

It is more particularly to a kind of to be based on data and task-driven the present invention relates to Image Classfication Technology field in computer vision Image classification method.

Background technology

Image classification, which is that computer vision is most basic, one of to study a question, and the problem of it will be solved is exactly a given figure As whether automatic decision wherein includes certain type objects.Image classification problem is one core topic of vision research, many other to regard Feel that research will rely on and be related to object point in image classification problem, such as objects in images detection, tracking, image segmentation, video Class, detection, tracking, behavioural analysis, gesture identification etc..

K nearest neighbor image classification is a kind of image classification method, refers to use k nearest neighbor ballot when to image classification The most class prediction of occurrence number is the classification of the test sample in mode, i.e. K a nearest image.Except can simply, Efficiently realize outside being classified to image, k nearest neighbor classification also has a lot of other characteristics.Such as k nearest neighbor image classification can be obtained Take with the immediate sample of test image, can apply in fields such as image retrieval, face retrieval, video frequency searchings.

Because the selection of grader and image feature representation are two independent processes in conventional art, and k nearest neighbor point Class is a nonparametric model, and it predicts the spatial distribution for depending critically upon data, namely image feature representation, and this results in figure As feature representation is not optimal for k nearest neighbor classification, classification performance is impacted.

In recent years, image classification field is quickly grown, and many important breakthroughs are achieved in terms of sorting technique.Currently, word Bag model is one of image feature representation main flow framework.Word bag model is described by the low-level image feature of the image block to intensive extraction Statistical nature description is carried out, obtains and the global feature of image is expressed.Word bag model is generally described by low-level image feature, vision word The step such as generation, low-level image feature coding, feature convergence, classifier training and test is constituted, and before classifier training, we can be with Think that word bag model uses unsupervised mode and image is expressed, whether the low-level image feature such as traditional SIFT, HOG Or the expression of word bag model middle level features, does not all use the label information of image, thus such feature representation is near for K It is not usually optimal for the such printenv model of neighbour's classification.

The content of the invention

In view of this, the main object of the present invention is to provide a kind of image classification method based on data and task-driven, To realize on large scale image data set more fast and accurately image classification.

In order to achieve the above object, the present invention uses following technical scheme：

A kind of image classification method based on data and task-driven, including：

Data set prepares, and convolutional neural networks structure is designed according to data set scale and picture material；

Model training, is trained using given grouped data set pair convolutional neural networks model；

Feature representation extraction is carried out to training set image using the convolutional neural networks after training；

The convolutional neural networks that test image is inputted after training, are classified using k nearest neighbor mode to test image.

Further, the data set prepares, and convolutional neural networks structure is designed according to data set scale and picture material, Also include：

Realize that data strengthen at least through the one or more in the following manner：1) image is removed from original image random cropping The marginal portion of surrounding is to produce the new sample image with nuance；2) added in original image pixels random high This noise produces new sample image.

Image pattern is zoomed into fixed dimension, and pixel is stretched into a vector and is used as the defeated of convolutional neural networks Enter.

Further, the model training, is trained using given grouped data set pair convolutional neural networks model, tool Body also includes：

Essential characteristic transformation model is used as using convolutional neural networks；

It is expected that error rate is trained as loss function to convolutional neural networks model based on adjacent PCA；

Optimization method based on gradient carries out network training, and carries out computing using GPU.

Further, the convolutional neural networks using after training carry out extraction feature representation, bag to training set image Include：

The convolutional neural networks that all training images are inputted after training, take out the response conduct of last layer of full articulamentum The feature representation of every training image.

Further, the feature representation of training set image is configured to KD- trees and prestored.

Further, the convolutional neural networks that test image is inputted after training, using k nearest neighbor mode to test chart As being classified, including：

For giving test image, by the image scaling to convolutional neural networks mode input size, convolution is then fed into Neutral net carries out forward calculation, takes out the response of last layer of full articulamentum as the feature representation of the test image, uses The expression carries out k nearest neighbor retrieval in the feature representation of training set image, will occur in K nearest training image of feature representation The most class prediction of number of times is the classification of the test image.

Further, it is described it is expected that error rate is entered as loss function to convolutional neural networks model based on adjacent PCA Row training, is specifically included：

K nearest neighbor error in classification is estimated using adjacent PCA NCA, N is given to training sample { (x_i,y_i) | i= 1 ..., N }, wherein x_iIt is image pattern, y_iIt is its corresponding label, for a sample x_iFor, another sample x_jWith x_iCategory It is in the other definition of probability of same class

Wherein F () is by the eigentransformation function of the convolutional neural networks；

For adjacent PCA, sample x_iThe probability for belonging to classification c and correctly being classified is

It is expected that error rate is

Wherein, y_ijRepresent sample x_iWith sample x_jWhether same category, y are belonged to_i=y_jWhen, y_ij=1, otherwise y_ij=0.It is public Expectation error rate in formula (3) is one of k nearest neighbor classification error rate approximate, with damage of the expectation error rate as the network optimization Lose function.

Further, the optimization method based on gradient carries out network training and specifically includes one of in the following manner：At random Gradient decline, conjugate gradient method, quasi-Newton method, L-BFGS.

The above-mentioned image classification method based on data and task-driven that the present invention is provided, compared with prior art with Lower advantage：

1), using based on linear Convolution feature learning, can in the way of data-driven implementation model to data set Adaptively, so as to preferably describe specific data set.

2), optimized, convolutional neural networks are carried out in the way of task-driven excellent by the error directly to k nearest neighbor Change, better performance can be obtained in k nearest neighbor task.

3), efficient training can be carried out using GPU in the training stage, need to only uses CPU just to realize in test phase Efficient k nearest neighbor image classification, is highly suitable for the tasks such as large-scale image classification, retrieval.

Brief description of the drawings

Fig. 1 is to be flowed according to image classification method model training of the embodiment of the present invention based on data and task-driven with test Cheng Tu；

Fig. 2 is trained on handwritten numeral database MNIST using adjacent PCA according to the embodiment of the present invention and obtained The sub- schematic diagram of convolutional neural networks first layer convolution；

Fig. 3 is to train obtained convolution god on CIFAR-10 databases using adjacent PCA according to the embodiment of the present invention Through the sub- schematic diagram of network first tier convolution；

Fig. 4 be according to the embodiment of the present invention to MNIST use of numerals method dimensionality reduction proposed by the present invention to 2 dimension result with And the contrast with other method, the data point of different colours represents different numerals.

Embodiment

For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with specific embodiment, and reference Accompanying drawing, the present invention is described in more detail.

The present invention thought main points be：1) linear Convolution feature learning is based on, can be realized in the way of data-driven Model is to the adaptive of data set, so as to preferably describe specific data set；2) present invention is by directly to the error of k nearest neighbor Optimize, convolutional neural networks are optimized in the way of task-driven, can be obtained in k nearest neighbor task more preferably Performance；3) efficient training can be carried out using GPU in the training stage, only CPU need to be used just to realize height in test phase The k nearest neighbor image classification of effect, is highly suitable for the tasks such as large-scale image classification, retrieval.

As shown in figure 1, Fig. 1 top halfs are according to image classification side of the embodiment of the present invention based on data and task-driven Method model training flow chart.The structure of convolutional neural networks is shown in figure, the sub- size of preceding networks layer convolution is 5 × 5, with The reduction of trellis diagram size, the sub- size of convolution used below is connected to complete comprising 128 and 64 neurons for 3 × 3, finally Articulamentum.By the way that training set sample order is sent into convolutional neural networks, and using adjacent PCA loss function from back to front Error propagation is carried out, the gradient per layer network parameter is calculated, and network is carried out using stochastic gradient descent algorithm with this gradient Update, to realize that the network model of data-driven and task-driven learns.Fig. 1 the latter half elaborates the test of this method Journey.The convolutional neural networks that different classes of original image input is trained, the sound that last full articulamentum is obtained by network The feature representation of image is should be used as, a new non-linear space is constituted, different classes of image can be more within this space Make a distinction well.

The method of the present invention comprises the following steps：

S1, data set prepare, and convolutional neural networks structure is designed according to data set scale and picture material.

Convolutional neural networks include substantial amounts of model parameter, and parameter means that model is more complicated, easier in training When occur over-fitting situation, i.e. algorithm on training set performance very well, performance is very poor on test set.Balance training number It is to prevent over-fitting according to scale and model complexity, obtains the important channel of optimum performance.On the one hand, data volume is bigger, Model training is less susceptible to over-fitting, and performance is better.But data volume is typically limited, and this is accomplished by using certain method people Work produces new data, and data enhancing is accomplished by the following way in we：1) image surrounding is gone from original image random cropping Marginal portion is to produce the new sample image with nuance；2) random Gaussian is added in original image pixels Produce new sample image.All image patterns are all scaled to fixed dimension, and all pixels stretch into one to Measure the input as convolutional neural networks.On the other hand, it is necessary to which accordingly control convolution is refreshing in the case of given training set scale Model complexity through network.Structure of the model complexity of convolutional neural networks generally with model is directly related, the layer of network Number it is more, every layer of nodes are more, can training parameter it is more, model is more complicated.

S2, based on adjacent PCA expect error rate convolutional neural networks are trained as loss function, optimization side Method uses stochastic gradient descent, and carries out computing using GPU.

Convolutional neural networks are widely used in fields such as image classification, detection, segmentations.In such applications, Convolutional neural networks are normally based on what general object classification criterion was trained, such as logistic is returned, Softmax is returned, Cross entropy etc..Obtained network is trained using this general standard, can directly predict and obtain result, but to be used for directly carrying out Handle k nearest neighbor classification problem, then it is not usually optimal.Above-mentioned general classification be end to end, it is direct from image Predicted, and k nearest neighbor problem is typically to have a feature representation to every image, and k nearest neighbor inspection is carried out using this feature expression Rope, the classification of test sample is determined by the classification of K nearest samples.If the net that we are trained using general sorting criterion Network carries out feature representation (can take last layer of response of network as feature) to image, it is difficult to ensure that this feature expression is applicable In nearest neighbour classification scene.The feature representation of k nearest neighbor problem is preferably directed to for study, we come direct using new training criterion K nearest neighbor error is optimized, so that the feature representation for ensureing study is optimal under k nearest neighbor problem.

It is irrational directly to use k nearest neighbor error as object function, since it is desired that object function can continuously be led, So as to be updated using stochastic gradient descent method to network.We are using adjacent PCA (Neighborhood Component Analysis, NCA) to carry out approximate description to k nearest neighbor error in classification.Given N is to training sample { (x_i,y_i)|i =1 ..., N }, wherein x_iIt is image pattern, y_iIt is its corresponding label.For a sample x_iFor, another sample x_jWith x_i Belonging to the other definition of probability of same class is

Wherein F () is a highly complex non-linear transform function, by input picture be for conversion into a feature to Amount, we represent F () using convolutional neural networks here, and take out last layer of response of neutral net as mark sheet Reach.From formula 1 as can be seen that sample x_jWith x_iFall into same category of probability be inversely proportional to it is European between both feature representations Distance.

In k nearest neighbor classification, the prediction classification of a test sample is that occur most in its K nearest sample by counting Many classifications.For adjacent PCA, sample x_iThe probability for belonging to classification c and correctly being classified is

Then it is expected that error rate can be defined as

Wherein y_ijRepresent sample x_iWith sample x_jWhether same category, y are belonged to_i=y_jWhen, y_ij=1, otherwise y_ij=0.It is public Expectation error rate in formula 3 is one of k nearest neighbor classification error rate approximate, excellent as network with the expectation error rate in this patent The object function of change.The adjacent component lost function that formula 3 is defined can continuously be led, thus can easily be used based on gradient Optimization method carry out network training, such as stochastic gradient descent, conjugate gradient method, quasi-Newton method, L-BFGS.

S3, the convolutional neural networks for training the input of all training images, take out the response of last layer of full articulamentum It is used as the feature representation of every training image.After convolutional neural networks are trained, we become this network as feature Exchange the letters number F (), for the extraction feature from image.K nearest neighbor classification is nonparametric model, and itself is without parameter, it is only necessary to will The each image input convolutional neural networks of training set, and last layer of full articulamentum is taken out as the mark sheet of the sample Reach.The feature representation of the training set extracted can be prestored, and be retrieved in test phase for k nearest neighbor.

S4, the convolutional neural networks for training test image input, take out the response conduct of last layer of full articulamentum The feature representation of this image, and use k nearest neighbor mode in the character pair expression of training set image using this feature expression Classified, by the classification that the most class prediction of occurrence number in the immediate K image of feature representation is the test image. The feature representation that the feature representation of test image is extracted in advance with training set image is compared, and is generally required and all training Sample is all once compared, and this time complexity is proportional to training set size.For very big training set, this shows Right cost is very high, be it is faster carry out k nearest neighbor lookup, using KD-Tree this data commonly used in nearest _neighbor retrieval Structure, by the way that K dimension spaces are split, accelerates the speed of neighbour's retrieval.

Fig. 2 is trained on handwritten numeral database MNIST using adjacent PCA according to the embodiment of the present invention and obtained The sub- schematic diagram of convolutional neural networks first layer convolution.Wherein, the sample in MNIST databases, right figure displaying is shown in left figure Be convolutional neural networks first layer convolution after training, it can be seen that substantially digital stroke learn.

Fig. 3 is to train obtained convolution god on CIFAR-10 databases using adjacent PCA according to the embodiment of the present invention Through the sub- schematic diagram of network first tier convolution.Find out from right figure, convolution learnt is some edges and simple bottom vision Pattern.

Fig. 4 be the method dimensionality reduction that is proposed according to the embodiment of the present invention to MNIST use of numerals this patent to 2 dimensions results with And the contrast with other method, the data point of different colours represents different numerals.As can be seen that different digital mark sheets Realize and preferably cluster up to spatial distribution, suitable for next being classified using k nearest neighbor mode classification.

In a word, the present invention proposes a kind of new image classification method based on data and task-driven, is rolled up using deep layer Product neutral net carries out model training as feature representation model, and using stochastic gradient descent algorithm on GPU.Convolutional Neural After the completion of network training, for extracting feature representation from image and carrying out image classification using k nearest neighbor way of search.Test table The bright invention compared with image classification algorithms of the main flow based on k nearest neighbor have feature representation have strong identification, model training by Data-driven, task-driven, test process are extremely efficient, the k nearest neighbor image classification being suitable under large scale data and retrieval.

Particular embodiments described above, has been carried out further in detail to the purpose of the present invention, technical scheme and beneficial effect Describe in detail it is bright, should be understood that the foregoing is only the present invention specific embodiment, be not intended to limit the invention, it is all Within the spirit and principles in the present invention, any modification, equivalent substitution and improvements done etc., should be included in the guarantor of the present invention Within the scope of shield.

Claims

1. a kind of image classification method based on data and task-driven, it is characterised in that including：

The convolutional neural networks that test image is inputted after training, are classified using k nearest neighbor mode to test image；

Wherein, the model training, is trained using given grouped data set pair convolutional neural networks model, specifically also wrapped Include：

Optimization method based on gradient carries out network training, and carries out computing using GPU；

It is expected that error rate is trained as loss function to convolutional neural networks model based on adjacent PCA, specifically include：

K nearest neighbor error in classification is estimated using adjacent PCA NCA, N is given to training sample { (x_i, y_i) | i= 1 ..., N }, wherein, x_iIt is image pattern, y_iIt is its corresponding label, for a sample x_iFor, another sample x_jWith x_i Belonging to the other definition of probability of same class is

p_{i j} = \frac{e^{- | | F (x_{i}) - F (x_{j}) | |^{2}}}{Σ_{k &NotEqual; i} e^{- | | F (x_{i}) - F (x_{k}) | |^{2}}},

Wherein, F () is the eigentransformation function of the convolutional neural networks；

p_{i} = \frac{1}{N} \underset{y_{j} = c}{Σ} p_{i j},

It is expected that error rate is

\begin{matrix} e_{N C A} = 1 - \frac{1}{N} Σ_{i = 1}^{N} p_{i} \\ = 1 - \frac{1}{N} Σ_{i = 1}^{N} \underset{y_{j} = c}{Σ} p_{i j} \\ = 1 - \frac{1}{N} Σ_{i, j = 1}^{N} p_{i j} y_{i j} \end{matrix},

Wherein, y_ijRepresent sample x_iWith sample x_jWhether same category, y are belonged to_i=y_jWhen, y_ij=1, otherwise y_ij=0；The phase Prestige error rate is one of k nearest neighbor classification error rate approximate, with loss function of the expectation error rate as the network optimization.

2. the image classification method according to claim 1 based on data and task-driven, it is characterised in that data set is accurate It is standby, convolutional neural networks structure is designed according to data set scale and picture material, in addition to：

Realize that data strengthen at least through the one or more in the following manner：1) image surrounding is gone from original image random cropping Marginal portion to produce the new sample image with nuance；2) random Gaussian is added in original image pixels to make an uproar Sound produces new sample image.

3. the image classification method according to claim 1 based on data and task-driven, it is characterised in that data set is accurate It is standby, convolutional neural networks structure is designed according to data set scale and picture material, in addition to：

Image pattern is zoomed into fixed dimension, and pixel is stretched into a vector as the input of convolutional neural networks.

4. the image classification method according to claim 1 based on data and task-driven, it is characterised in that use training Convolutional neural networks afterwards carry out extraction feature representation to training set image, including：

The convolutional neural networks that all training images are inputted after training, the response for taking out last layer of full articulamentum is used as every The feature representation of training image.

5. the image classification method according to claim 4 based on data and task-driven, it is characterised in that by training set The feature representation of image is configured to KD- trees and prestored.

6. the image classification method according to claim 1 based on data and task-driven, it is characterised in that by test chart As the convolutional neural networks after input training, test image is classified using k nearest neighbor mode, including：

For giving test image, by the image scaling to convolutional neural networks mode input size, convolutional Neural is then fed into Network carries out forward calculation, takes out the response of last layer of full articulamentum as the feature representation of the test image, uses the table K nearest neighbor retrieval is carried out up in the feature representation in training set image, by occurrence number in K nearest training image of feature representation Most class predictions is the classification of the test image.

7. the image classification method according to claim 1 based on data and task-driven, it is characterised in that based on gradient Optimization method carry out network training specifically include one of in the following manner：Stochastic gradient descent, conjugate gradient method, quasi-Newton method, L-BFGS。