CN105740879A

CN105740879A - Zero-sample image classification method based on multi-mode discriminant analysis

Info

Publication number: CN105740879A
Application number: CN201610026972.7A
Authority: CN
Inventors: 冀中; 谢于中
Original assignee: Tianjin University
Current assignee: Orpheus Group Co.,Ltd.
Priority date: 2016-01-15
Filing date: 2016-01-15
Publication date: 2016-07-06
Anticipated expiration: 2036-01-15
Also published as: CN105740879B

Abstract

A zero-sample image classification method based on multi-mode discriminant analysis comprises the steps of constructing matrixes based on the visual feature of training data and semantic features of corresponding categories, getting a mapping matrix, verifying massed learning to get a weight alpha(i), using the mapping matrix to map the visual feature of training data and semantic features of unseen categories to a common space, and classifying test data. According to the invention, a common space between the visual feature of an image and the semantic features of multiple modes can be sought, and higher accuracy is achieved in zero-sample image classification, so the zero-sample image classification method is effective. The method is simple, and has a good effect. Apart from the zero-sample image classification problem, the method can adapt to other multi-mode classification and retrieval problems.

Description

The zero sample image sorting technique based on multi-modal discriminant analysis

Technical field

The present invention relates to one and realize zero sample image sorting technique.Particularly relate to one by multi-modal discriminant analysis, set up contacting between the visual space of image and the semantic space of image category, thus realizing the zero sample image sorting technique based on multi-modal discriminant analysis of zero sample image classification.

Background technology

For traditional image classification system, want to accurately identify out certain class image, it is necessary to provide the training data of corresponding tape label.But the label of training data is often be difficult to obtain, zero sample image classification solves a kind of effective means of class label disappearance problem exactly, its object is to imitate the mankind without having seen actual vision sample, just can recognize the ability of new classification.Zero sample image categorizing system, by having the training data of label, the classification namely met, sets up mapping relations between visual space and semantic space.Then according to these mapping relations, the visual signature of test data is associated with the semantic feature of unseen classification, selects semantic immediate classification as the label of test data.

In zero sample image classification, for test image and the corresponding item name of unseen classification, it is necessary to set up contact by semantic space.In semantic space, each item name is expressed as a high dimension vector.In Prior efforts, this semantic space is normally based on attribute, and then each item name just can be expressed as an attribute vector.50 class animal paintings are labelled with 85 semantic attributes by such as Lampert et al., and the color of such as object, shape etc., with it as high-level semantics describing mode.

In recent years, along with the development of natural language processing technique, the semantic space based on text vector is popular gradually.Conventional text vector extracting method is Mikolov et al. Word2Vec proposed, and it is a kind of unsupervised method, it is possible to represented by the word vector in corpus, and the similarity between vector can well simulate the similarity in the semanteme of word.

After trying to achieve that met and unseen classification semantic feature vector in given semantic space, the semantic dependency between of all categories just can be obtained by the distance between semantic feature vector.But, image is to be represented by the visual feature vector in visual space, and due to the existence of semantic gap, it can not directly be set up with the characteristic vector of semantic space and contact.Existing method by the semantic feature of the visual signature of classification picture met and respective labels, learns a mapping function being mapped to semantic space from visual space mostly.Then, by this mapping function, the visual signature of test picture is mapped to semantic space, obtains the semantic feature of prediction, then find out from its nearest semantic feature not meeting classification, so that it is determined that generic.

But, the semantic space that the semantic feature of single mode is constituted tends not to the category structure of sufficient descriptor data set.

Common zero sample image sorting technique is that the visual signature of image is mapped to the semantic feature space of item name, then classifies.But, the luv space that the semantic feature of item name is constituted tends not to well describe the category structure of data set.Therefore can improve from following two aspect: one, visual signature and semantic feature are mapped to a public space, then further they be set up contact；Two, use the semantic feature of multiple modalities, from multiple angles, the category structure of data set is described.Multi-modal discriminant analysis just can meet the two demand simultaneously.

Summary of the invention

The technical problem to be solved is to provide a kind of zero sample image sorting technique based on multi-modal discriminant analysis that the semantic feature of the visual signature of training image and image category title can be mapped to a public space.

The technical solution adopted in the present invention is: a kind of zero sample image sorting technique based on multi-modal discriminant analysis, comprises the steps:

1) the visual signature X of training data is used₁And the semantic feature X of respective classes₂,…X_cBuild matrix S and D, wherein,

S_{j r} = \{\begin{matrix} Σ_{i = 1}^{c} (Σ_{k = 1}^{n_{i j}} x_{i j k} {x_{i j k}}^{T} - \frac{n_{i j} n_{i r}}{n_{i}} μ_{i j}^{(x)} μ_{i j}^{(x) T}), j = r \\ - Σ_{i = 1}^{c} \frac{n_{i j} n_{i r}}{n_{i}} μ_{i j}^{(x)} μ_{i r}^{(x) T}, j &NotEqual; r \end{matrix} - - - (1)

D_{j r} = (Σ_{i = 1}^{c} \frac{n_{i j} n_{i r}}{n_{i}} μ_{i j}^{(x)} μ_{i j}^{(x) T}) - \frac{1}{n} (Σ_{i = 1}^{c} n_{i j} μ_{i j}^{(x)}) {(Σ_{i = 1}^{c} n_{i j} μ_{i j}^{(x)})}^{T} - - - (2)

In formula, x is the vector in visual signature matrix or semantic feature matrix, and i represents classification sequence number, and j represents mode sequence number, and k represents sample sequence number, and c represents the sum of classification, and n represents the sum of sample,It is expressed as:

2) seek following formula, obtain mapping matrix W:

\underset{W_{1}, W_{2}, ... W_{v}}{m a x} T r (\frac{W^{T} D W}{W^{T} S W}), - - - (3);

3) weight α in following formula is obtained in checking massed learning_i

k^{*} = \underset{k}{argmax} [Σ_{i = 2}^{c} α_{i} s i m (W_{1}^{T} x_{j}, W_{i}^{T} y_{i}^{k})], - - - (4)

K=1,2 ..., n.

In formula, x_jIt is the visual signature of checking data,It is and x_jThe semantic feature of the kth mode of corresponding classification, sim (a, b)=a^TB/ (| | a | | | | b | |), is two vectorial distances；

4) mapping matrix W is used, by the visual signature of test dataSemantic feature y with unseen classification^kMap to public space；

5) by step 3) in formula to test data classify, the k in formula^*It it is the test corresponding classification of data.

The zero sample image sorting technique based on multi-modal discriminant analysis of the present invention, has the advantages that

1, usual way can only seek the public space between the visual signature of image and the semantic feature of single mode, and the multi-modal discriminant analysis of the present invention can seek the public space between the visual signature of image and the semantic feature of multiple mode.

2, item name can be described by the semantic feature of multiple mode of the present invention from different perspectives, thus reaching better to describe effect.Through experimental verification, can only use with other single mode semantic feature method compared with, the method for the present invention can obtain higher accuracy rate in zero sample image classification, is therefore a kind of effective zero sample image sorting technique.

3, the method for the present invention is simple, excellent effect.Except zero sample image classification problem, also adapt to other multi-modal classification, search problem simultaneously.

Detailed description of the invention

Below in conjunction with embodiment, the zero sample image sorting technique based on multi-modal discriminant analysis of the present invention is described in detail.

Zero sample image classification belongs to the image classification problem in machine learning.Classification problem refers to, according to known training dataset one grader of study, then utilizes this grader that new input example is classified.Zero sample image classification is also classification problem, simply concentrates the classification new test data do not occur at training data.The present invention passes through multi-modal discriminant analysis, sets up contacting between the visual space of image and the semantic space of image category, thus realizing zero sample image classification.

The zero sample image sorting technique based on multi-modal discriminant analysis of the present invention is intended to utilize multi-modal discriminant analysis, a kind of effective zero sample image sorting technique is provided, by the method for the present invention, the semantic feature of the visual signature of training image and image category title can be mapped to a public space, and then effectively compare the distance between the visual signature after mapping and semantic feature, such that it is able to better solve zero sample image classification problem.In this public space, visual signature and the corresponding semantic feature of image have good corresponding relation.For newly inputted test image, its visual signature is mapped to public space, finds the semantic feature of the unseen classification the most close with it, it is possible to determine the generic of test image.

The zero sample image sorting technique based on multi-modal discriminant analysis of the present invention, comprises the steps:

S_{j r} = \{\begin{matrix} Σ_{i = 1}^{c} (Σ_{k = 1}^{n_{i j}} x_{i j k} {x_{i j k}}^{T} - \frac{n_{i j} n_{i r}}{n_{i}} μ_{i j}^{(x)} μ_{i j}^{(x) T}), j = r \\ - Σ_{i = 1}^{c} \frac{n_{i j} n_{i r}}{n_{i}} μ_{i j}^{(x)} μ_{i r}^{(x) T}, j &NotEqual; r \end{matrix} - - - (1)

D_{j r} = (Σ_{i = 1}^{c} \frac{n_{i j} n_{i r}}{n_{i}} μ_{i j}^{(x)} μ_{i j}^{(x) T}) - \frac{1}{n} (Σ_{i = 1}^{c} n_{i j} μ_{i j}^{(x)}) {(Σ_{i = 1}^{c} n_{i j} μ_{i j}^{(x)})}^{T} - - - (2)

2) seek following formula, obtain mapping matrix W:

\underset{W_{1}, W_{2}, ... W_{v}}{m a x} T r (\frac{W^{T} D W}{W^{T} S W}), - - - (3);

3) the weight α i in following formula is obtained in checking massed learning

k^{*} = \underset{k}{argmax} [Σ_{i = 2}^{c} α_{i} s i m (W_{1}^{T} x_{j}, W_{i}^{T} y_{i}^{k})],

K=1,2 ..., n. (4)

Claims

1. a zero sample image sorting technique based on multi-modal discriminant analysis, it is characterised in that comprise the steps:

1) the visual signature X of training data is used₁And the semantic feature X of respective classes₂,...X_cBuild matrix S and D, wherein,

S_{jr} = \{\begin{matrix} Σ_{i = 1}^{c} (Σ_{k = 1}^{n_{ij}} x_{ijk} {x_{ijk}}^{T} - \frac{n_{ij} n_{ir}}{n_{i}} μ_{ij}^{(x)} μ_{ij}^{(x) T}), j = r \\ - Σ_{i = 1}^{c} \frac{n_{ij} n_{ir}}{n_{i}} μ_{ij}^{(x)} μ_{ir}^{(x) T}, j &NotEqual; r \end{matrix} - - - (1)

D_{jr} = (Σ_{i = 1}^{c} \frac{n_{ij} n_{ir}}{n_{i}} μ_{ij}^{(x)} μ_{ij}^{(x) T}) - \frac{1}{n} (Σ_{i = 1}^{c} n_{ij} μ_{ij}^{(x)}) {(Σ_{i = 1}^{c} n_{ij} μ_{ij}^{(x)})}^{T} - - - (2)

2) seek following formula, obtain mapping matrix W:

\underset{W_{1}, W_{2}, ... W_{v}}{m a x} T r (\frac{W^{T} D W}{W^{T} S W}), - - - (3);

3) weight α in following formula is obtained in checking massed learning_i

k^{*} = \underset{k}{argmax} [Σ_{i = 2}^{c} α_{i} s i m (W_{1}^{T} x_{j}, W_{i}^{T} y_{i}^{k})], - - - (4)

K=1,2 ..., n.