CN113191381B

CN113191381B - Image zero-order classification model based on cross knowledge and classification method thereof

Info

Publication number: CN113191381B
Application number: CN202011402935.4A
Authority: CN
Inventors: 曾婷; 向鸿鑫; 谢诚
Original assignee: Yunnan University YNU
Current assignee: Yunnan University YNU
Priority date: 2020-12-04
Filing date: 2020-12-04
Publication date: 2022-10-11
Anticipated expiration: 2040-12-04
Also published as: CN113191381A

Abstract

The invention discloses an image zero-order classification model based on cross knowledge, which comprises a biological classification tree module; constructing a biological classification tree according to all categories in the data set; the visual feature extraction module: the system is used for converting images in the data set into one-dimensional visual features; a semantic feature extraction module: the semantic feature extraction module is used for converting texts or attributes in the data set into one-dimensional semantic features; a cross knowledge learning module: semantic information for enriching categories; generating a confrontation network module: the image recognition system comprises a generator and a discriminator, wherein the generator generates a pseudo visual feature from a semantic feature, and the discriminator is used for discriminating the authenticity and the category of an image; the cross knowledge learning is adopted, more relevant semantic features can be trained, so that the features from semantics to vision are embedded into the ZSL, and the semantic features in the cross-modal learning process are enriched; the model and the method are simple and efficient, and high-accuracy classification results are obtained on a plurality of authoritative data sets.

Description

Image zero-order classification model based on cross knowledge and classification method thereof

Technical Field

The invention relates to the technical field of image classification, in particular to an image zero-order classification model based on cross knowledge and a classification method thereof.

Background

The field of image classification is increasingly attractive due to rapid expansion of data size and the explosion of machine learning models, however, collecting sufficient data sets is time consuming and laborious, and some data sets are unavailable. How to correctly and efficiently classify certain categories in the case of partial data set loss becomes one of the main challenges facing the image classification field.

Aiming at the problem of imperfect data set, the mainstream scheme in the field at present firstly proposes the concept of Zero-learning (ZSL). It can identify new classes that have not appeared in the training phase in the testing phase, i.e. to solve the situation where the labeled training samples are not sufficient to cover all object classes. The zero-order classification method simplifies the image classification problem of the lack of samples into a traditional image classification problem. The generating ZSL attempts to learn the relationship between semantic features and visual features from the seen classes and then generate a composite image for the unseen classes. Currently, the mainstream generation-type zero-order image classification methods include FeatGen, GAZSL, ZSLPP, CIZSL, etc., which utilize a generation countermeasure network as a basic structure of a depth model, and simplify the depth model into a conventional image classification task by generating a dummy sample. For example, GDAN (generic Dual Adversal Network) uses a Dual Generative countermeasure Network to accomplish the semantic to visual bi-directional mapping.

In the field of animal and plant image classification, although the above method has achieved good effect in a mode without enough training set, two challenges still exist:

1. cross-modality (cross-modality) problem: the cross-modal problem in visual-semantic embedding causes incomplete expression of semantic features and visual features in the embedding process, and particularly in two classes which are very similar and have no difference in embedding space, so that the model is difficult to distinguish, and the performance of the model is greatly reduced;

2. cross-domain (cross-domain) problem: the seen class may intersect the unseen class very rarely (or not), and the visual appearance of the same attribute or text description may be significantly different in the unseen class, which may make it difficult for the model to accurately distinguish the unseen class when embedding the semantic vector into the visual space.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides an image zero-order classification model based on cross knowledge and a classification method thereof.

In order to achieve the purpose of the invention, the invention is realized by the following technical method:

a zero-order image classification model based on cross knowledge comprises

A biological classification tree module: constructing a biological classification tree according to all categories in the data set;

the visual feature extraction module: the system is used for converting images in the data set into one-dimensional visual features;

a semantic feature extraction module: the semantic feature extraction module is used for converting texts or attributes in the data set into one-dimensional semantic features;

a cross knowledge learning module: semantic information for enriching categories;

generating a confrontation network module: the image recognition system comprises a generator and a discriminator, wherein the generator generates a pseudo-visual feature from a semantic feature, and the discriminator is used for discriminating the authenticity and the category of an image.

Preferably, the bio-taxonomy tree includes a Family level, a Genus level, and a specifices level, the specifices level including all categories in the dataset.

Preferably, the visual feature extraction module adopts ResNet101, and the semantic feature extraction module adopts Term Frequency Inverse Document Frequency.

Preferably, the cross-learning module is a Family level and a Genus level which are obtained by combining a plurality of classes into a biological classification tree by using biological taxonomy, and the cross-learning is performed inside the Family level, the Genus level and the specifices level respectively.

Preferably, the generator is represented as:

the arbiter is represented as:

L _T expressed as:

wherein the content of the first and second substances,

respectively expressed as:

K ^s,F expressed as:

preferably, the generator employs taxonomic regularization.

Preferably, the data set is animal and plant image data.

Preferably, the training method of the model comprises the following steps:

step S1: constructing a biological classification tree according to all class names in the data set;

step S2: respectively inputting images, texts or attribute descriptions in the data set into a visual feature extraction module and a semantic feature extraction module, and extracting visual vectors and semantic vectors;

and step S3: constructing visual feature data sets of Family level, genus level and specifices level according to the biological classification tree and the visual vector;

and step S4: constructing semantic feature data sets of Family level, genus level and specifices level according to the biological classification tree and the semantic vector;

step S5: initializing a discriminator and a generator with classification regularization;

step S6: from the visual characteristic data set and the semantic characteristic data set of the same level, semantic characteristics and visual characteristics are selected in a cross mode, and the L of the network is generated by using the countermeasure _G And L _D Performing cross training;

step S7: inputting the semantic special V into a generator to obtain pseudo-visual characteristics, and calculating a TR regularization item according to the pseudo-visual characteristics and the visual characteristics;

step S8: inputting the pseudo-visual features and the visual features into a discriminator, and calculating L _D Classification loss and true and false loss of;

step S9: from TR, classification penalty, and true-false penalty, calculate gradients and update L _G And L _D ；

Step S10: the process of S6-S9 is iterated until a termination condition is reached.

Preferably, the termination condition is the number of iterations set before training, and the number of iterations is 5000-10000.

Preferably, the classification method comprises the following steps:

step S1: inputting image data to be classified into a visual feature extraction module and a semantic feature extraction module to respectively obtain visual features and semantic features;

step S2: inputting the semantic features into a generator to obtain pseudo visual features;

and step S3: and calculating the similarity between the visual features and the pseudo-visual features, and searching the highest similarity between the visual features and the pseudo-visual features, wherein the category to which the visual features with the highest similarity belong is the category to which the animal and plant images belong.

Compared with the prior art, the invention has the beneficial effects that:

1. the Cross Knowledge Learning (CKL) is adopted, more relevant semantic features can be trained, so that the features from semantics to vision are embedded into the ZSL, and the semantic features in the Cross-modal Learning process are enriched;

2. the invention uses classification normalization to generate more universal visual characteristics so as to increase the cross points of unseen images in the ZSL, thereby obviously relieving the adverse effect brought by cross-domain problems;

3. the model and the method are simple and efficient, and high-accuracy classification results are obtained on a plurality of authoritative data sets.

Drawings

FIG. 1 is a diagram of a classification model framework according to the present invention;

FIG. 2 is a block diagram of a classification method according to the present invention;

Detailed Description

The invention will be further described with reference to specific embodiments, and the advantages and features of the invention will become apparent as the description proceeds. The examples are illustrative only and do not limit the scope of the present invention in any way. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention, and that such changes and substitutions are intended to be within the scope of the invention.

As shown in FIG. 1, a zero-order image classification model based on cross knowledge comprises

generating a confrontation network module: the image recognition system comprises a generator and a discriminator, wherein the generator generates a pseudo visual feature from a semantic feature, and the discriminator is used for discriminating the authenticity and the category of an image.

Preferably, the generator is represented as:

the arbiter is represented as:

L _T expressed as:

wherein, the first and the second end of the pipe are connected with each other,

respectively expressed as:

K ^s,F expressed as:

preferably, the generator employs taxonomic regularization.

Preferably, the data set is animal and plant image data.

Preferably, the training method of the model comprises the following steps:

step S6: cross-selecting semantic features and semantic features from the same hierarchical visual feature dataset and semantic feature datasetVisual features, L of network generation with antagonism _G And L _D Performing cross training;

step S7: inputting the semantic character V into a generator to obtain a pseudo-visual feature, and calculating a TR regularization term according to the pseudo-visual feature and the visual feature;

step S8: inputting the pseudo-visual features and the visual features into a discriminator, and calculating L _D Classification loss and true and false loss of (c);

Preferably, the classification method as shown in fig. 2 includes the following steps:

and step S3: and calculating the similarity between the visual features and the pseudo visual features, and searching the highest similarity between the visual features and the pseudo visual features, wherein the category to which the visual features with the highest similarity belong is the category to which the animal and plant images belong.

Example 1

A zero-order image classification model training method based on cross knowledge comprises the following steps:

Preferably, the termination condition is the number of iterations set before training, and the number of iterations is 5000.

Example 2

Preferably, the termination condition is the number of iterations set before training, and the number of iterations is 7500.

Example 3

step S6: from the visual feature data set and the semantic feature data set of the same hierarchy, cross-selectingSelecting semantic and visual features, generating L of a network with a countermeasure _G And L _D Performing cross training;

Preferably, the termination condition is the number of iterations set before training, and the number of iterations is 10000.

Example 4

This implementation was performed on four datasets: CUB (Caltech-UCSD-Birds 200-2011), NAB (North America Birds), aPY (Attributes Pascal and Yahoo), awA2 (Attributes with Attributes 2).

Data set details as shown in table 1, we used the CUB and NAB data sets as ZSL benchmarks based on wikipedia, and the AwA2 and aPY data sets as ZSL benchmarks based on attributes. In ZSL based on wikipedia, we used TF-IDF to extract 7551D features of the data set CUB and 13217D features of NAB, respectively. In attribute-based ZSL, we directly use the attributes provided in the original dataset as semantic features. Also, we use the visual features provided by the original dataset directly, using a pre-trained ResNet101 extraction.

1. Data set partitioning scenarios

Table 1 data set details

Description of the drawings: s represents the dimension of the semantic features, type represents the type of the semantic features, X represents the number of pictures, Y represents the number of types of visible classes and unseen classes, ys represents the number of visible classes, and Yu represents the number of unseen classes.

There are two partitioning strategies in the CUB and NAB datasets, super-Category-Shared (simple) and Super-Category-Exclusive (difficult). The two partitioning strategies are divided according to whether they share the same parent class or not. In SCS partitioning, for each unseen class, there are one or more unseen classes of the parent class. For example, the seen class "Indigo Bunting" and the unseen class "Lazuli Bunting" have the same parent class "Bunting". In SCE partitioning, unseen classes never share the same parent class as seen classes. It can be seen that the visible class and the unseen class have higher correlation on SCS, while the visible class and the unseen class have less correlation on SCE. Therefore, zero-learning classification and retrieval under SCE is more difficult than under SCS.

2. In the present embodiment, two evaluation criteria are used:

top-1 accuracy: the accuracy rate of the first ranked category matching the actual result;

area Under Sen-Unsien Accuracy Curve (AUSUC): known-area under unknown precision curve.

Table 2 shows the comparison with the current optimal accuracy method at different segmentations on CUB and NAB. The values of the upper right hand indices in the numbers represent an increase and decrease in accuracy compared to the corresponding baseline method. Obviously, the invention is superior to the CIZSL and other reference methods.

TABLE 2 comparison of ZSL image classification results on CUB and NAB

To prove that our approach is effective under different semantic representations, we follow the GBU setting, whereby the textual semantic representation of Wikipedia is changed to an attribute. As shown in table 3, our approach is significantly superior to all others on the AwA2 and the aby datasets.

TABLE 3 comparison of ZSL image classifications on AwA2 and aPY datasets

As shown in tables 2 and 3, compared with the reference experiment, after using the CKL and the TR, the experiment effect is significantly improved, which shows that the present invention not only effectively solves the problem that the neural network distinguishes the seen categories and the unseen categories, but also can reduce the prediction category space of the zero sample image classification under the broad sense, and effectively prevents the problems of cross-domain and cross-mode.

In summary, compared with the reference method, the method provided by the present embodiment obtains a better result in the evaluation index, thereby verifying the validity of CKL and TR in the method. In addition, the model has better independence, and the zero sample image classification model and the generation countermeasure model are trained separately.

Claims

1. A zero-order image classification model based on cross knowledge is characterized by comprising

a cross knowledge learning module: the system is used for enriching the category semantic information, and transmits the data obtained by the biological classification tree module, the visual characteristic extraction module and the semantic characteristic extraction module to the confrontation network generation module after processing;

generating a confrontation network module: the image recognition system comprises a generator and a discriminator, wherein the generator generates a pseudo-visual feature from a semantic feature, and the discriminator discriminates the authenticity and the category of an image according to the pseudo-visual feature and the visual feature;

the biological classification tree comprises a Family level, a Genus level and a specials level, wherein the specials level comprises all categories in the data set; the cross knowledge learning module is used for merging a plurality of classes into a Family level and a Genus level in a biological classification tree by using biological taxonomy, and performing cross learning respectively in the Family level, the Genus level and the specifices level.

2. The cross-knowledge-based image zero-order classification model of claim 1, wherein the visual feature extraction module adopts ResNet101, and the semantic feature extraction module adopts Term Frequency Inverse Document Frequency.

3. The cross-knowledge based image zero-order classification model of claim 1, wherein the generator is represented as:

L _G ＝-E _z ～P _z [D _ω (G _θ (ts,A,z))]+L _cls (D _ω (G _θ (ts,A,Z)))+L _T ；

the arbiter is represented as:

L _T expressed as:

wherein the content of the first and second substances,

respectively expressed as:

K ^s,F expressed as:

and constructing visual feature data sets of Family level, genus level and specifices level according to the biological classification tree and the visual vector.

4. The cross-knowledge based image zero-order classification model according to claim 3, wherein the generator employs taxonomic regularization.

5. The cross-knowledge based image zero-order classification model according to claim 1, wherein the data set is animal and plant image data.

6. The training method of the zero-order image classification model based on the cross knowledge as claimed in claim 1, wherein the training method of the model comprises the following steps:

step S6: cross-selecting semantic features and visual features from the same level of visual feature dataset and semantic feature dataset, using the L of the countermeasure generation network _G And L _D Performing cross training;

step S7: inputting the semantic features into a generator to obtain pseudo-visual features, and calculating TR regularization terms according to the pseudo-visual features and the visual features;

7. The cross knowledge-based training method for the image zero-order classification model according to claim 6, wherein the termination condition is the number of iterations set before training, and the number of iterations is 5000-10000.

8. The method for classifying the zero-order image classification model based on the cross knowledge as claimed in claim 1, wherein the method for classifying the zero-order image comprises the following steps:

and step S3: and calculating the similarity between the visual features and the pseudo-visual features, and searching the highest similarity between the visual features and the pseudo-visual features, wherein the category to which the pseudo-visual features with the highest similarity belong is the category to which the image belongs.