CN106127247B

CN106127247B - Image classification method based on the more example support vector machines of multitask

Info

Publication number: CN106127247B
Application number: CN201610466376.0A
Authority: CN
Inventors: 阮奕邦; 肖燕珊; 刘波; 郝志峰; 黎启祥
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2016-06-21
Filing date: 2016-06-21
Publication date: 2019-07-09
Anticipated expiration: 2036-06-21
Also published as: CN106127247A

Abstract

The invention discloses a kind of image classification methods based on the more example support vector machines of multitask.This method comprises: establishing T learning tasks for T group image；The image of T learning tasks is carried out how instantiating；For one class packet of picture construction of each classification in T task；Establish the example in class packet to more example packets Euclidean distance formula；Example distance vector of the building class packet to more example packets；Establish class packet to more example packets weighted euclidean distance formula；The distance for constraining more example packets to generic is less than the distance to other classifications；Establish the optimization problem of the more example support vector machines of multitask；Converting optimization problem is traditional single task list example support vector machines problem；Solve support vector machines optimization problem.The present invention relates to a kind of methods for optimizing weighted euclidean distance formula, by establishing the more example support vector machines problems concerning study of multitask example images, so that optimal dissolve ideal weight, to improve the performance of Image Classifier.

Description

Image classification method based on the more example support vector machines of multitask

Technical field

The present invention relates to Image Classfication Technology fields, more particularly to the image based on the more example support vector machines of multitask Classification method.

Background technique

With the progress of information technology and the sustainable development of social networks, the figure of magnanimity is had existed above internet Picture, and the amount of images newly uploaded on internet daily also exponentially rises, and the scene that image is included is also more and more richer Richness, although social network sites have obtained permanent development, the picture of magnanimity is not fully utilized but on website, and Can all there are a large amount of new images to upload to above website daily, how identify not labeled image, and Accurate classification arrives It is most of Internet company all in a problem of research with preferably site for service user in corresponding classification.

On the one hand, due to that may include various background elements when shooting image, then it will lead to image It include not only a scene, if such as single example support vector machines may using traditional single example image recognition methods Lead to misclassification.For example, at the zoo photographed when, different plant species may be photographed same image simultaneously, such as The animals such as people, horse, bird all may be in same image.

On the other hand, due to the opening of internet and the diversity of capture apparatus, the photo of the same person may It appears in above different social network sites, perhaps as captured by distinct device or from different video institute editings, this A little pictures, which are mixed, to be identified, it is clear that is unreasonable；Furthermore it in order to improve the performance of Image Classifier, needs a large amount of Markd image carry out the training of classifier, if lack of training samples, will lead to the performance decline of classifier, from And influence the effect of image classification.The image classification of early stage is classified by way of handmarking, but this side The artificial of method successfully can be very high, perhaps also feasible under a small amount of image, but generates speed with the present image in internet, It is then less desirable.

Summary of the invention

Although there are many quantity in the same type of image marked face on the internet, not due to source mode Together, for example, the equipment of shooting or the social network sites of storage are different, these pictures, which are mixed, to carry out the training of classifier and is Unreasonable, but training is grouped according to source formation, then it can be potentially encountered lack of training samples so as to cause classification The problems such as accuracy decline of device, it is possible to using the form of multitask, several groups picture be trained simultaneously, and utilized The correlation of every group of picture improves the performance of every group of picture classifier.And since image contains multiple scenes, image is seen It is handled at single example, then can neglect the correlation of multiple scenes, multi-instance learning method can be used at this time, one A image regards multiple examples as.

Image classification method based on the more example support vector machines of multitask of the invention includes the following steps:

(1) image of several groups is obtained, and guarantees that the quantity of every group of image is few, as unit of group, establishes several Learning tasks, and in the form of handmarking, carry out the manual sort of image.

(2) all images of all learning tasks, more sample datas are converted to.

(3) in each multi-instance learning task, an associated more example packets are constructed for each image category, this is more Example packet in the present invention be known as class packet, and establish the example in class packet to more example packets Euclidean distance formula.

(4) building class packet to more example packets example distance vector, so that the weighting for establishing class packet to more example packets is European Range formula.

(5) constraint is established, guarantees that more example packets will be far smaller than to the distance of other classifications to the distance of generic.

(6) optimization problem of the more example support vector machines of multitask is established.

(7) the more example support vector machines optimization problems of the multitask of switch process (6) are a similar single task list example The optimization problem of support vector machines.

(8) the support vector machines optimization problem of solution procedure (7), can obtain the weight of optimization, to train one A Image Classifier based on the more example support vector machines of multitask, carries out the classification of image.

Detailed description of the invention

Fig. 1 is the flow chart of the Web page classification method of the invention based on maximum spacing multitask multi-instance learning.

Specific embodiment

The first step, obtains the image of several groups, and guarantees that the quantity of every group of image is few, as unit of group, if establishing Dry learning tasks, and in the form of handmarking, carry out the manual sort of image.For example, if there is T group image, then T Image Classifier learning tasks are established, and since the amount of images of T task is all few, handmarking can be carried out.

All images of all learning tasks are converted to more sample datas by second step.Since image contains multiple fields Scape, and when classification, it is only necessary to one of key scenes, so whole image is converted to a single example at this time Classify, the correlation of multiple scenes may be neglected, classifying quality is caused to be deteriorated, so at this time can be using showing more Example learning method carries out image classification.Before multi-instance learning method, need to carry out more sample datas to image, it can With using classical image cutting method, such as the compartmentalization for Blobworld System, the Lai Jinhang image that the present invention uses, this When to each image-region carry out feature extraction, so that the image-region be made to be converted to an example.One image contains multiple Region can then be converted to multiple examples, and an image is properly termed as example packet more than one at this time.

Third step constructs an associated more example packets in each multi-instance learning task for each image category, More example packets in the present invention be known as class packet, and establish the example in class packet to more example packets Euclidean distance formula.No As traditional more exemplary methods, the present invention does not pay close attention to the distance between image and image directly, but all of each classification Image is combined, and establishes more example packets of a class rank, referred to as class packet, and the example established in class packet to showing more The Euclidean distance formula of example packet, as follows:

In above formula, exampleIt is class packet C_ktJ-th of example,It is more example packet B_itCenter.n_ktIt is class packet C_kt Example number.

4th step, the example distance vector of building class packet to more example packets, to establish the weighting of class packet to more example packets Euclidean distance formula.In the third step, can in the hope of each class packet example to more example packets apart from size, with this apart from size For vector element, class packet is established to the example distance vector of more example packets, then k-th of class of t-th of task is clipped to i-th more shows The example distance vector of example packetIt is as follows:

Establish one and example distance vectorThe weight vector w of equal length_kt, which is defined as follows:

By example distance vectorWith weight vector w_ktWant to multiply, then the weighting of available class packet to more example packets is European Range formula:

5th step, establish constraint, guarantee more example packets to generic distance will far smaller than arrive other classifications away from From.Establish following constraint:

In above formula, P_t(B_it) it is more example packet B_itAffiliated category set, N_t(B_it) be and more example packet B_itUnrelated class Do not gather,For error term, which ensure that classification n to more example packet B_itDistance be greater than classification p to more example packets B_itDistance.

6th step establishes the optimization problem of the more example support vector machines of multitask.In t-th of task, all categories Weight vector form a vector w_t, it is as follows:

Correspondingly, one isometric vector of buildingVectorByWith-Composition, the other positions of the vector Filling 0, it is possible to the constraints conversion established in the 5th step be following form:

Based on the constraint, w_tBe converted to the form of multi-task learning, i.e. w_t=w₀+v_t, w₀It is considered as that all tasks are total The public weight coefficient enjoyed, and v_tIt is the weight coefficient that each task then exclusively enjoys, establishes the more example branch of a multitask thus The optimization problem of vector machine is held, as follows:

In above formula, C_wFor controlling error termSize, regularization parameter γ₀And γ₁For controlling multi-instance learning Similitude between task.If γ₀It is intended to infinity, then it is not that each multi-instance learning task, which trains the classifier come, It is relevant.Opposite, if γ₁It is intended to infinity, then it is identical that all multi-instance learning tasks, which train the classifier come, Or it is similar.

7th step, the more example support vector machines optimization problems of multitask for turning the 6th step are a similar single task list example The optimization problem of support vector machines.In order to use the numerical technologies such as quadratic programming solve the more examples of the multitask support to Amount machine problem, needs the problem to be converted to the form of a similar traditional support vector machine optimization problem, therefore establishes two Vector is as follows:

According to two above vector, the more example support vector machines of multitask of the 6th step can be converted to the support of standard Vector machine optimization problem form is as follows:

8th step solves the support vector machines optimization problem of the 7th step, the weight of optimization can be obtained, to train One Image Classifier based on the more example support vector machines of multitask, carries out the classification of image.

In the case where not departing from spirit of that invention or necessary characteristic, the present invention can be embodied in other specific forms.It answers The specific embodiment various aspects are considered merely as illustrative and not restrictive.Therefore, scope of the invention such as appended claims It is shown as indicated above shown in range.Change in all equivalent meanings and range for falling in claim should be regarded as It falls in the scope of claim.

Claims

1. a kind of image classification method based on the more example support vector machines of multitask, which comprises the steps of:

The first step, the image for obtaining several groups establish several learning tasks, and as unit of group with the shape of handmarking Formula carries out the manual sort of image；

Second step, all images all learning tasks, are converted to more sample datas；

Third step, in each multi-instance learning task, for each image category construct an associated more example packet set, More example packet collection are collectively referred to as class packet, and establish the example in class packet to more example packets Euclidean distance formula；

The example distance vector of 4th step, building class packet to more example packets, so that the weighting for establishing class packet to more example packets is European Range formula；

5th step establishes constraint, guarantees that more example packets are less than the distance of other classifications to the distance of generic；

6th step, the optimization problem for establishing the more example support vector machines of multitask；

7th step, convert the more example support vector machines optimization problems of multitask of the 6th step into a single task list example support to The optimization problem of amount machine；

8th step, the support vector machines optimization problem for solving the 7th step, can obtain the weight of optimization, to train one Based on the Image Classifier of the more example support vector machines of multitask, the classification of image is carried out；

In 6th step, the optimization problem of the more example support vector machines of multitask is established；In t-th of task, all classes Other weight vector forms a vector w_t, it is as follows:

Correspondingly, one isometric vector of buildingVectorByWithComposition, the other positions filling of the vector 0, it is possible to the constraints conversion established in the 5th step be following form:

Based on the constraint, w_tBe converted to the form of multi-task learning, i.e. w_t=w₀+v_t, w₀It is considered as all task sharings Public weight coefficient, and v_tIt is the weight coefficient that each task is exclusively enjoyed, establishes the more example supporting vectors of a multitask thus The optimization problem of machine is as follows:

In above formula, T is task number, C_wFor controlling error termSize, regularization parameter γ₀And γ₁It is more for controlling Similitude between learn-by-example task, if γ₀It is intended to infinity, then each multi-instance learning task trains point come Class device is incoherent；Opposite, if γ₁It is intended to infinity, then all multi-instance learning tasks train the classification come Device is same or similar；

In 7th step, the more example support vector machines optimization problems of multitask for turning the 6th step are a single task list example branch The optimization problem of vector machine is held, solves the more example supporting vectors of the multitask to use the numerical technologies such as quadratic programming Machine problem, needs the problem to be converted to the form of a similar traditional support vector machine optimization problem, thus establish two to It measures as follows:

In above formula,According to two above vector, the more example support vector machines of multitask of the 6th step can be converted It is as follows for the support vector machines optimization problem form of standard:

2. the image classification method according to claim 1 based on the more example support vector machines of multitask, which is characterized in that In third step, in each multi-instance learning task, an associated more example packets are constructed for each image category, this shows Example packet be known as class packet, and establish the example in class packet to more example packets Euclidean distance formula specifically: each classification All images are combined, and establish more example packets of a class rank, referred to as class packet, and the example established in class packet to showing more The Euclidean distance formula of example packet, as follows:

Wherein, exampleIt is class packet C_ktJ-th of example,It is more example packet B_itCenter, n_ktIt is class packet C_ktExample Number.

3. the image classification method according to claim 2 based on the more example support vector machines of multitask, which is characterized in that In 4th step, building class packet to more example packets example distance vector, thus the weighting for establishing class packet to more example packets it is European away from From formula, in the third step, can in the hope of each class packet example to more example packets apart from size, using this apart from size as vector Element establishes class packet to the example distance vector of more example packets, then k-th of class of t-th of task is clipped to the packet of example more than i-th Example distance vectorIt is as follows:

By example distance vectorWith weight vector w_ktIt is multiplied, then weighted euclidean distance of the available class packet to more example packets Formula:

4. the image classification method according to claim 3 based on the more example support vector machines of multitask, which is characterized in that In 5th step, constraint is established, guarantees that more example packets are less than the distance of other classifications to the distance of generic, is established following Constraint:

In above formula, P_t(B_it) it is more example packet B_itAffiliated category set, N_t(B_it) be and more example packet B_itUnrelated classification collection It closes,For error term, which ensure that classification n to more example packet B_itDistance be greater than classification p to more example packet B_it's Distance.