CN109582960B

CN109582960B - Zero example learning method based on structured association semantic embedding

Info

Publication number: CN109582960B
Application number: CN201811423691.0A
Authority: CN
Inventors: 熊红凯; 陈宇
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2018-11-27
Filing date: 2018-11-27
Publication date: 2020-11-24
Anticipated expiration: 2038-11-27
Also published as: CN109582960A

Abstract

The invention provides a zero-instance learning method based on structured association semantic embedding, which constructs a semantic embedding space by extracting attribute features of an input image, excavates a semantic embedding association relation in the semantic embedding space and constructs a local linear correlation graph structure, and then utilizes the structured restriction of the semantic embedding space to constrain a classification model of a known class, thereby finally synthesizing a model of an unknown class and realizing the effective prediction of the class of an untrained sample. The invention can keep the global structure of the semantic space while digging the semantic embedded information with similar categories, can obtain effective semantic representation and effectively improve the category prediction accuracy of unseen category image samples.

Description

Zero example learning method based on structured association semantic embedding

Technical Field

The invention relates to the technical field of computer vision, in particular to a zero-example learning method based on structured association semantic embedding.

Background

Zero example learning is a very important and challenging problem in the field of computer vision. State of the art supervised learning approaches have achieved significant achievements in a number of areas, however the performance of supervised learning is highly dependent on large scale labeled data. Conversely, zero-instance learning may take knowledge from samples of trained classes, migrate to samples of untrained classes, and enable differentiation of untrained class samples. To achieve this goal, it is important that the knowledge between the trained class samples and the untrained class samples be migrated and that the untrained class samples be efficiently described.

Through the literature search of the prior art, l.c.h, n.hannes and h.stafan, in the article "Attribute-based classification for zero-shot visual object classification" published in IEEE Transactions on Pattern Analysis and Machine Analysis conference 2014, a method for learning an Attribute classifier is proposed, i.e., a classifier is learned for each Attribute, and samples with no category are distinguished through the Attribute classifier output of discriminant test data. However, a bias is introduced in the process of establishing the classifier, so that the discrimination of the unseen class sample is influenced. To solve this problem, researchers have proposed to directly measure the correlation between images and class embedding, by learning a linear or non-linear compatibility function, to directly correlate images with class embedding. However, such methods can be biased when the training data is not consistent with the test data distribution, so researchers have attempted to solve the zero-instance learning problem using fusion model-based methods, i.e., considering class embedding as a combination of already seen class components. The 'Zero-shot Learning by y context communication of semantic embedding of unseen classes' published by Mohammad et al in the International Conference on Learning retrieval meeting of 2014 proposes that semantic embedding of unseen classes is combined and synthesized by the embedded weights of the seen classes, and the combined weights are corresponding probability outputs of pre-trained classifiers thereof. However, this method requires training the classifier in advance, which is time consuming and inefficient. Soravit et al, in the article "Synthesized classifiers for zero-shot learning" published at the IEEE Conference on Computer Vision and Pattern Recognition Conference of 2016, proposed a method for implementing semantic embedding and knowledge migration between classifiers using pseudo-classes, and synthesizing new classifiers by convex combination of the pseudo-classes. This approach ignores the semantically embedded local structure, which may provide more information for the exact synthesis classifier.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a zero-example learning method based on structured association semantic embedding.

The invention provides a zero example learning method based on structured association semantic embedding, which comprises the following steps:

step 1: extracting attribute features of the known category images;

step 2: establishing a semantic embedding space according to the attribute characteristics, and establishing a model space formed by classification models of known class samples;

and step 3: introducing R pseudo categories into the semantic embedding space and the model space to serve as relationship migration carriers of the semantic embedding space and the model space;

and 4, step 4: acquiring a scaled squared Euclidean distance between a real category and a pseudo category of a known category image, and taking the scaled squared Euclidean distance as a connection weight;

and 5: reconstructing a graph structure of a semantic embedding space in a model space according to the connection weight;

step 6: obtaining R of each real category of the known category image with the connection weight within a preset range_nA pseudo category;

and 7: constructing a set of semantic embedding rules for each semantic embedding in the semantic embedding space_nAn objective function synthesized by pseudo classes;

and 8: constructing a classification model, wherein the classification model consists of contiguous R_nSynthesizing the pseudo classes;

and step 9: learning R pseudo classes, R, of the semantic embedding space and model space according to the objective function_nPseudo classes to minimize loss of the classification model;

step 10: extracting attribute features of unknown image, and using R as said attribute features_nCharacterizing each pseudo category to obtain a graph structure of the unknown category image;

step 11: and projecting the graph structure of the unknown class image from the semantic embedding space to a model space to obtain projected pseudo classes, and synthesizing the classification model of the unknown class image according to the projected pseudo classes.

Optionally, the step 4 includes:

in a training sample, the number of known classes is S, the number of unknown classes is U, and the number of pseudo classes is R; defining the coordinates of each class i as s_i(i ═ 1,2, …, S + U), all classes of the pre-set training samples are on the same manifold of the semantic embedding space, with the help of the introduced pseudo-class p_m(m-1, 2, …, R) preserving the correlation between training sample attributes, presetting each semantic embedding and adjacent R_nPseudo classes are embedded in local linear blocks of the space manifold semantically; defining the scaled squared Euclidean distance as a similarity measure; each s_iPseudo class p with neighbors_mThe calculation formula of the connection weight between the two is as follows:

wherein s is_iSemantic embedding representing class i, with neighboring pseudo-classes sharing R_nA plurality of; p is a radical of_mDenotes s_iNearest neighbor R_nOne of the pseudo categories; w is a_miEach representsS is_iPseudo class p of its neighbors_mWeight of connection between w_miThe larger the value of (A), the more the pseudo class p is represented_mEmbedding with semantics s_iThe greater the similarity between;

embedding R corresponding to neighbor according to each semantic_nPseudo categories, namely constructing a category association diagram of a semantic embedding space; the vertex set of the class association graph comprises semantically embedded attributes and introduced pseudo classes, and the edge set comprises edges connecting the attributes and the pseudo classes; the semantic embedding representation of each pseudo category neighborhood is as follows:

wherein, mu_miRepresenting the weight between semantic embeddings and pseudo classes.

Optionally, the step 8 includes:

defining the classification model corresponding to each class i as u_i(i ═ 1,2, …, S + U), all classification models are preset on the same manifold, and similarly, R pseudo classes are introduced on the manifold, then each classification model U_iWeights w that can be embedded in space by semantics_miAnd R_nA contiguous pseudo class t_m(m ═ 1,2, …, R) represents:

wherein: u. of_iRepresenting a classification model; t is t_mRepresenting a pseudo category; w is a_miRepresents each s_iPseudo class p of its neighbors_mThe weight of the connection between.

Optionally, the formula of the objective function is as follows:

wherein:

pseudo class t representing a learning model space_mAnd weight μ of semantic embedding space_miMinimizing multi-class classification loss, R representing a total number of false classes, S representing a total number of seen classes, R_nNumber of pseudo classes, x, representing the neighbourhood of each class i_kRepresents input data, λ represents a weighting coefficient of the classification model, and γ represents a weighting coefficient of the weight;

the first term of the objective function is the squared chained penalty with a one-to-many classification strategy

Wherein u is_iClassifier, x, representing class i_kRepresenting input data for ensuring that the classifier is more concerned with gross classification errors, such that gross classification errors tend to be minimal;

indicating function

Represents a category y_kWhether the same as the category i is detected, if the same as the category i, 1 is selected, and if not, 1 is selected; the second term of the objective function is the classification model u_iTo prevent overfitting;

the third target item of the target function is the weight mu of semantic embedding space_miConstraint is carried out to lead the film to tend to be sparse; | g | represents a modulo operation;

representing a norm operation.

Optionally, when the category correlation is considered, the method is further experimented by using a Crammer-Singer multi-category support vector machine strategy, that is, the first term of the objective function is

Is replaced by

Wherein: i belongs to S- { y_kIndicates that category i belongs to the known category set S except for category y_kOne of the other categories;

representing an optimized class i such that it is associated with class y_kThe difference between the semantic embedding space and the model space is maximum; s_gSemantic embedding of the representation class g in the semantic embedding space, u_gDenotes the classifier, x, corresponding to the class g_kRepresenting input data; u. of_iA classifier representing the class i is provided,

represents a category y_kA classifier of_structRepresenting Cramer-Singer multi-class losses, s, taking into account class correlations_iThe semantic embedding representing the category i is performed,

represents a category y_kThe semantic embedding of (2) is carried out,

denotes s_iAnd

performing norm operation of the difference value;

optionally, the objective function is solved by using an alternative optimization method.

Compared with the prior art, the invention has the following beneficial effects:

the zero example learning method based on the structural association semantic embedding provided by the invention assumes and utilizes more structural associations in the semantic embedding space through the zero example learning frame, utilizes the semantic representation of similar categories to represent the assumed conditions which can be projected to the adjacent positions in the embedding space, and can be helpful for predicting the classifier of the unseen categories. The invention can keep the global structure of the semantic space while digging the semantic embedded information with similar categories, can obtain effective semantic representation and effectively improve the category prediction accuracy of unseen category image samples.

In the alternative, the present invention takes into account the proximity embedding associations of similar classes, exploits the structured associations and proposes to predict a new classifier by building a locally linearly connected graph structure for unseen classes. The invention reserves the global structure while utilizing the local incidence relation, and obtains more effective semantic information representation by strengthening the influence of adjacent embedding.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

FIG. 1 is a schematic diagram illustrating a schematic structure of a zero-example learning method based on structured association semantic embedding according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a process of migration of a structural relationship in an embodiment of the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.

Fig. 1 is a schematic structural diagram of a principle of a zero-instance learning method based on structured association semantic embedding according to an embodiment of the present invention, as shown in fig. 1, the embodiment includes: constructing a class relation of a manifold space, constructing a structural constraint classification model, and synthesizing an unseen class model, wherein: the category relation construction is that a graph structure retaining a local linear relation is constructed for unseen category samples by mining a structural relation of similar category semantic embedding and considering the local adjacency relation of similar categories; the structured constraint classification model introduces a pseudo category into a semantic space, and projects a graph structure of the semantic space to a model space formed by a known category classifier by means of the pseudo category; and synthesizing the unseen type model by utilizing the projected pseudo type and the weight of the semantic space to reconstruct the image structure in the model space and synthesizing the unseen type model. The process of structured relationship migration is shown in FIG. 2.

The method firstly calculates R embedded in each semantic in the training process_nNearest neighbor and form a graph structure with local linear correlation. And then projecting the pseudo categories in the semantic space into a model space. In the prediction phase, the semantics of each unseen category are embedded by its R_nAnd (3) expressing the pseudo classes of the nearest neighbors, projecting the pseudo classes to a model space from a semantic embedding space, reserving the weight in the semantic embedding space, and reconstructing the model in the model space to synthesize a new classifier and distinguish unseen classes. Since overlapping local structures can provide global aggregate information, the method preserves global structures in the semantic space while embedding local distributions with semantics. When the local semantic representation is projected into the model space, the global non-linear structure is also recovered from the local connection relations. The optimization objectives of the training process are as follows:

wherein:

pseudo class t representing a learning model space_mAnd weight μ of semantic space_miMinimizing multi-class classification loss, R representing a total number of false classes, S representing a total number of seen classes, R_nNumber of pseudo classes, x, representing the nearest neighbor of each class i_kRepresents the input data, λ represents the weighting coefficient of the classification model, and γ represents the weighting coefficient of the weight. The first term of the objective function is the square chain loss l (x, y; u) ═ max (0, 1-yu) using a one-to-many classification strategy^Tx)²Wherein y represents a class label, u represents a classifier, and x represents input data, which is used to ensure that the classifier focuses more on overall classification errors, so that the overall classification errors tend to be minimal; indicating function

Represents a category y_kWhether the same as the category i is detected, if the same as the category i, 1 is selected, and if not, 1 is selected; the second term of the objective function is the classification model u_iTo prevent overfitting; the third target item is the weight mu to semantic space_miConstraint is carried out to lead the film to tend to be sparse; | g | represents a modulo operation;

representing a norm operation. In addition, in consideration of the class correlation, the first term of the objective function is also tested by using a Crammer-Singer multi-class support vector machine, namely

representing an optimized class i such that it is associated with class y_kThe difference is greatest in both the semantic space and the model space. s_gSemantic embedding of the representation class g in the semantic space, u_gDenotes the classifier, x, corresponding to the class g_kRepresenting the input data.

The invention obtains attribute representations at a class level by averaging the attribute representations of images of the same class. All variables are initialized randomly, and the hyper-parameters in the objective function are obtained by performing cross validation on the training data of each data set. The cross-validation strategy in this embodiment employs class-based segmentation of the training data rather than the traditional direct segmentation of the training data. Further, the unconnected edge distance in the constructed graph model is set to ∞. Further, the number of neighboring pseudo classes R_nInitially set to a decimal multiple of the number R of pseudo classes and then gradually narrowed to obtain more accurate results depending on the change in classification accuracy.

In this embodiment, training and testing are performed on the AWA data set and the SUN data set, respectively. The AWA dataset is a dataset of animal attributes containing 85 binary attributes and 30475 pictures, for a total of 50 categories. The SUN dataset is a dataset of scene attributes, which contains 102 attributes and 14340 pictures, for a total of 717 scene categories. In this embodiment, the semantic space attribute features adopted by the AWA data set are extracted and obtained by AlexNet, and the semantic space attribute features adopted by the SUN data set are extracted and obtained by GoogleNet. The effectiveness and the generalization of semantic representation extraction are evaluated through the multi-class classification accuracy of untrained class images. The classification accuracy of the zero-example learning method based on structured relevance semantic embedding and the zero-example learning method of the synthetic classifier proposed by Soravit et al in "Synthesized classifiers for zero-shot learning" under two loss strategies of a pair of rest and structured SVM are compared.

The result is: by applying a zero-example learning method of structured associated semantic embedding, the classification accuracy of a test sample under an AWA data set reaches 72.3% and 74.6% respectively under a pair of other loss strategies and structured SVM loss strategies, and is respectively improved by 2.6% and 2.3% compared with a method for synthesizing a classifier model; the classification accuracy of the test sample under the SUN data set respectively reaches 62.4% and 62.7% under a pair of loss strategies of the rest SVM and the structured SVM, and is respectively improved by 0.3% and 0.3% compared with the method for synthesizing the classifier model. Due to the fact that structural association of semantic embedding is considered, the method obtains higher classification accuracy under different loss conditions, and the effectiveness of the method is verified through experimental results.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. A zero example learning method based on structured association semantic embedding is characterized by comprising the following steps:

step 1: extracting attribute features of the known category images;

and step 9: learning R pseudo classes, adjacent R of the semantic embedding space and the model space according to the objective function_nPseudo classes to minimize loss of the classification model;

step 10: extracting attribute features of an unknown type image, and using adjacent R for the attribute features_nCharacterizing each pseudo category to obtain a graph structure of the unknown category image;

step 11: projecting the graph structure of the unknown class image from the semantic embedding space to a model space to obtain projected pseudo classes, and synthesizing a classification model of the unknown class image according to the projected pseudo classes;

the step 8 comprises the following steps:

defining the classification model corresponding to each class i as u_i(i ═ 1,2, …, S + U), preset all classification modesThe types are on the same manifold, and similarly, R pseudo classes are introduced on the manifold, so that each classification model u_iWeights w that can be embedded in space by semantics_miAnd R_nA contiguous pseudo class t_mRepresents:

wherein: u. of_iRepresenting a classification model; t is t_mRepresenting a pseudo category; w is a_miRepresents each s_iPseudo class p of its neighbors_mA connection weight between; s_iSemantic embedding representing category i;

the formula of the objective function is as follows:

wherein:

pseudo class t representing a learning model space_mAnd weight μ of semantic embedding space_miMinimizing multi-class classification loss, m 1,2, …, R, R representing the total number of false classes, S representing the total number of seen classes, R representing the total number of false classes_nNumber of pseudo classes, x, representing the neighbourhood of each class i_kRepresents input data, λ represents a weighting coefficient of the classification model, and γ represents a weighting coefficient of the weight;

the first term of the objective function is the squared chaining penalty with a one-to-many, multi-classification strategy

Wherein u is_iClass model, x, representing class i_kRepresenting input data for ensuring that the classification model is more concerned with gross classification errorsThe total classification error tends to be minimal;

indicating function

representing a norm operation.

2. The zero-example learning method based on structured association semantic embedding according to claim 1, wherein the step 4 comprises:

wherein s is_iSemantic embedding representing class i, with neighboring pseudo-classes sharing R_nA plurality of; p is a radical of_mDenotes s_iNearest neighbor R_nOne of the pseudo categories; w is a_miRepresents each s_iIs close to itNeighbor pseudo class p_mWeight of connection between w_miThe larger the value of (A), the more the pseudo class p is represented_mEmbedding with semantics s_iThe greater the similarity between;

embedding R corresponding to neighbor according to each semantic_nPseudo categories, namely constructing a category association diagram of a semantic embedding space; the vertex set of the category association graph comprises attributes embedded by semantics and introduced pseudo categories, and the edge set comprises edges connected between the attributes and the pseudo categories; the semantic embedding representation of each pseudo category neighborhood is as follows:

3. The zero-example learning method based on structured relevance semantic embedding according to claim 1, wherein the method is further experimented with Cramer-Singer multi-class support vector machine strategy, namely, the first term of the objective function, when considering class correlation

Is replaced by

representing an optimized class i such that it is associated with class y_kThe difference between the semantic embedding space and the model space is maximum; s_gSemantic embedding of the representation class g in the semantic embedding space, u_gIndicates the score corresponding to the category gClass device, x_kRepresenting input data; u. of_iA classification model representing the class i,

represents a category y_kThe semantic embedding of (2) is carried out,

denotes s_iAnd

and (5) performing norm operation of the difference value.

4. The zero-example learning method based on structured relevance semantic embedding according to claim 1, wherein the objective function is solved by adopting an alternative optimization method.