CN112686318B

CN112686318B - Zero sample learning mechanism based on sphere embedding, sphere alignment and sphere calibration

Info

Publication number: CN112686318B
Application number: CN202011629663.1A
Authority: CN
Inventors: 张磊; 沈佳怡; 甄先通; 李欣
Original assignee: Guangdong University of Petrochemical Technology
Current assignee: Guangdong University of Petrochemical Technology
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2023-08-29
Anticipated expiration: 2040-12-31
Also published as: CN112686318A

Abstract

The application discloses a zero sample learning mechanism based on sphere embedding, sphere alignment and sphere calibration, which comprises the following steps: and constructing a whole framework of the system, and identifying the semantic embedded network parameter learning process and the unseen category sample. The application proposes a joint objective function, wherein alpha and beta are superparameters adjusted during the experiment. The application uses sphere embedding, sphere alignment and sphere calibration in a optimization formula in a centralized way to respectively solve the problems of semantic gap, pivot and prediction deviation; the application maps the distance between visual features of the image and semantic descriptions of the category to a sphere calculation; the traditional Euclidean distance ignores angle information, and the cosine distance does not show radial distance at all, so the spherical surface embedding adopted by the application takes angle information and radial distance into more comprehensive consideration; the application adopts different radial distances aiming at the visible category and the invisible category, thereby emphasizing the effect of the sample of the invisible category.

Description

Zero sample learning mechanism based on sphere embedding, sphere alignment and sphere calibration

Technical Field

The application relates to the technical field of zero sample learning, in particular to a zero sample learning mechanism based on sphere embedding, sphere alignment and sphere calibration.

Background

In a real-world scenario, many tasks require identification of instance categories that have never been seen before, thus making the original training method unsuitable. Zero sample learning has occurred. Zero sample learning, also called Zero shot learning, is mainly aimed at realizing category prediction and identification of data of unseen categories through related priori knowledge according to visible category data in training set.

The existing methods are mainly related research work from several aspects of embedding models, generating models and measuring methods. The embedded model method mainly realizes the migration of knowledge of visible categories to unknown categories by mapping the features of the visual space onto category prototypes of semantic representation. The model generation method is to generate samples of unknown classes through generating a countermeasure model or a variational encoder through semantic descriptions of the classes, so that zero sample learning is converted into small sample or multiple sample learning. The measurement method is to select a proper measurement method in the embedded space and establish the similarity between the visual characteristics and the category prototypes.

The existing zero sample learning method can encounter several problems: 1. semantic gap problem between visible and invisible categories. The existing method establishes the association relation between the known category and the unknown category through the semantic space, and the description of the association relation is simple; 2. the pivot problem is that samples of many different categories which are encountered in zero sample learning may be closer to only a few category prototypes and far from most categories; 3. the problem of predictive bias, test images from unknown classes, always tend to identify known classes that are very close to the unknown class.

The application provides a zero sample learning mechanism based on sphere embedding, sphere alignment and sphere calibration, and provides a comprehensive solution to the three problems, namely a semantic gap problem, a pivot problem and a prediction deviation problem, and the problems are solved by fusing the sphere embedding, the sphere alignment and the sphere calibration into a frame.

Disclosure of Invention

The embodiment of the application provides a zero sample learning mechanism based on sphere embedding, sphere alignment and sphere calibration, which comprises the following steps: constructing a whole framework of the system, and identifying a semantic embedded network parameter learning process and an unseen category sample;

the system-wide framework construction includes:

the image is embedded into the network phi through visual characteristics, and the category information is embedded into the network through semantemeBy sphere-embedded KL distance, sphere-aligned R function and sphere-aligned minimum entropy constraint, an objective function is constructed as follows:

wherein the method comprises the steps ofThe KL distance of sphere embedding is represented, the KL distance distributed between an actual marked sample and a predicted sample is calculated, the specific step is shown as 2.5, R represents a sphere alignment function, and the specific calculation formula is shown as step 2.3; />The minimum entropy constraint for sphere calibration is represented, and a specific calculation formula is shown in step 2.4; alpha and beta are super-parameters regulated in the experimental process;

the semantic embedded network parameter learning process comprises the following steps:

input: category prototype a of visible category _C Category prototype a of aggregate and unknown categories _U Set, training data set D _C ^tr The visual features are embedded into the net;

and (3) outputting: semantic embedding networkParameters;

step 1: initializing, setting a batch size B and iteration times l, and initializing a semantic embedded networkParameters;

step 2: iteration number iter= [1: l ], do the following operations:

step 2.1: randomly sampling B samples;

step 2.2: category prototype A of a visible category _C And class prototype A of unknown class _U Projection into sphere-embedded space, i.e. for A _C ∪A _U Each category prototype a of (1), embedding the network according to semanticsGenerate->

Step 2.3: r is calculated according to the following formula:

wherein i and j respectively represent class labels corresponding to prototype a, cos (θ _i，j ) Then the cosine distance between the prototypes of the two categories i, j is represented;representing an alignment factor between two categories i, j; the specific calculation is as follows:

wherein a is _i Prototype representing class i, a _j A prototype representing class j;representing a semantic embedded network;

wherein u represents the uniform alignment factor, i.eRepresenting the balance alignment factor. S-valued represents the semantic alignment factor, i.e. +.>Representing a similarity alignment factor; and λ is a balance parameter that balances semantic alignment and uniform alignment, in [0,1]The value is taken in between;

the balance alignment factor is calculated as follows:

the similarity pair Ji Yinzi is calculated as follows:

step 2.4: calculated according to the following formula:

wherein the method comprises the steps ofRepresenting training dataset->Representing selection of samples x from training dataset _n ，/>Expressed in dataset +.>Expected on all samples, H [ q ]]Entropy representing distribution q; />Representing sample x _n Through a visual characteristic embedding network phi and a semantic embedding network +.>Then, under the condition of knowing the prototype a, predicting the probability distribution of the class label y; a is a corresponding prototype, y is a predicted class label;

the probability distribution q is calculated as:

wherein phi represents the embedding of the visual features into the network,representing a semantic embedded network, wherein C represents a known category number, and U represents an unknown category number; a, a _i Prototype representing class i, a _j A prototype representing class j; y is _i Representing the ith component in the y vector, i.e., x, under the conditions of a known prototype, a known semantic embedding network, and a visual feature embedding network _n Probability of belonging to class i;

wherein the function f _ρ The calculation is as follows:

wherein ρ is ₁ And ρ ₂ Spherical radius functions corresponding to the visible class and the invisible class respectively, the unknown class is set to have a longer radius than the known class, that is ρ ₂ ＞ρ ₁ ；

Step 2.5: minimizing the following objective function ρ ₂ ＞ρ ₁ ；

Wherein the method comprises the steps ofRepresenting training dataset->Representing selection of samples x from training dataset _n ，/>Expressed in dataset +.>The expectations on all samples; d (D) _KL Representing the calculation x _n Actual category label->KL distance between its predicted tag distribution p;

whileIs x _n The p-function is calculated as follows:

step 2.6: updating semantic embedded networks with backward propagation methodsParameters;

the unseen category sample identification includes:

input: test image x _m Category prototype A of visible category _C And class prototype A of unknown class _U Semantic embedding networkParameters, the visual characteristics are embedded into network phi parameters;

and (3) outputting: predictive output of the test image;

step 1: for test image x _m Calculating a visual representation of the test image;

step 2: category prototype A of a visible category _C And class prototype A of unknown class _U Projection into sphere-embedded space, i.e. for A _C ∪A _U Each category prototype a of (1), embedding the network according to semanticsGenerate->

Step 3: calculating a class prediction value for a test image according to the following formula

Wherein f for keeping in agreement with the training data _ρ The calculation is as follows:

wherein θ is _n，i Representation of prototype a _n And a _i An included angle between the two; wherein n represents a sample x _n Is a category of (2);

the embodiment of the application adopts the following technical scheme: learning to obtain a semantic embedded network by utilizing an objective function constructed in the whole framework of the systemParameters, thus realizing the fusion of sphere embedding, sphere alignment and sphere calibration into one frame, and solving the problems of semantic gap, pivot and prediction deviation.

The embodiment of the application adopts the following technical scheme: in the semantic embedded network parameter learning process, step 2 is that of step 2.3In the calculation formula of (1), lambda epsilon [0,1 ]]Is a super-parameter regulated in the test process.

The embodiment of the application adopts the following technical scheme: in the semantic embedded network parameter learning process, step 2 is that of step 2.4In the calculation formula of (2), H is the entropy of probability distribution q of training set samples.

The embodiment of the application adopts the following technical scheme: in the semantic embedded network parameter learning process, step 2 is that of step 2.4In the calculation formula of (2), C is the number of samples in visible categories, and U is the number of samples in invisible categories; phi (x) _n ) For image x _n Is a visual feature embedding function,/>Embedding a function, y, for semantic features of class prototype a _i Representing the ith component in the y vector, i.e., x, under the conditions of a known prototype, a known semantic embedding network, and a visual feature embedding network _n Probability of belonging to class i.

The embodiment of the application adopts the following technical scheme: in the semantic embedded network parameter learning process, step 2 is that of step 2.5In the formula, the alpha and beta experimental data are adjusted to be super-parameters.

The embodiment of the application adopts the following technical scheme: the first term in the equationThe spherical embedding is embodied.

The embodiment of the application adopts the following technical scheme: the second term in the formula αR (η) ^* ) Spherical alignment is embodied.

The embodiment of the application adopts the following technical scheme: third term in the formulaAnd spherical calibration is embodied.

The embodiment of the application adopts the following technical scheme:the semantic gap, the pivot problem and the prediction deviation problem are respectively solved by centralizing sphere embedding, sphere alignment and sphere calibration as an optimization formula.

The above at least one technical scheme adopted by the embodiment of the application can achieve the following beneficial effects:

the main differences between the application and other zero sample learning methods at present are as follows:

1. the joint objective function proposed in the present application:wherein alpha and beta are super-parameters regulated in the experimental process. The first term in the above equation represents sphere embedding, the second term represents sphere alignment, and the third term represents sphere alignment. The application uses sphere embedding, sphere alignment and sphere calibration in a optimization formula in a centralized way to respectively solve the problems of semantic gap, pivot and prediction deviation;

2. the application maps the distance between visual characteristics of the image and semantic descriptions of the category to a sphere calculation, a specific formula

The traditional Euclidean distance ignores angle information, and the cosine distance does not show radial distance at all, so the spherical surface embedding adopted by the application takes angle information and radial distance into more comprehensive consideration. Still further, the present application also uses different radial distances for visible and invisible categories, thereby exacerbating the effect of the invisible category samples.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a system overall frame diagram of a zero sample learning mechanism based on sphere embedding, sphere alignment and sphere calibration of the present application;

fig. 2 is a schematic diagram of experimental performance of the zero sample learning mechanism of the present application based on sphere embedding, sphere alignment and sphere calibration.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.

Examples

A zero sample learning mechanism based on sphere embedding, sphere alignment, and sphere calibration, comprising: constructing a whole framework of the system, and identifying a semantic embedded network parameter learning process and an unseen category sample;

the system-wide framework construction includes:

wherein the method comprises the steps ofRepresents the KL distance embedded in the sphere, calculated from the KL distance distributed between the actual marked sample and the predicted sample, see 2.5 for specific steps, R represents the sphere alignment function,the specific calculation formula is shown in step 2.3; />The minimum entropy constraint for sphere calibration is represented, and a specific calculation formula is shown in step 2.4; alpha and beta are super-parameters regulated in the experimental process;

input: category prototype a of visible category _C Category prototype a of aggregate and unknown categories _U Set, training data set D _C ^tr Visual characteristic embedded net

And (3) outputting: semantic embedding networkParameters;

step 2: iteration number iter= [1: l ], do the following operations:

step 2.1: randomly sampling B samples;

Step 2.3: r is calculated according to the following formula:

wherein u represents the uniform alignment factor, i.eRepresenting the equilibrium alignment factor, the S-valued representing the semantic alignment factor, i.e. +.>Representing a similarity alignment factor; and λ is a balance parameter that balances semantic alignment and uniform alignment, in [0,1]The value is taken in between;

the balance alignment factor is calculated as follows:

the similarity pair Ji Yinzi is calculated as follows:

step 2.4: calculated according to the following formula:

wherein the method comprises the steps ofRepresenting training dataset->Representing the selection of samples xn, < > from the training dataset>Expressed in dataset +.>Expected on all samples, H [ q ]]Entropy representing distribution q; />Representing sample x _n Through a visual characteristic embedding network phi and a semantic embedding network +.>Then, under the condition of knowing the prototype a, predicting the probability distribution of the class label y; a is a corresponding prototype, y is a predicted class label; the probability distribution q is calculated as:

wherein phi represents the embedding of the visual features into the network,representing a semantic embedded network, C representing a known category number, and U representing an unknown category number. a, a _i Prototype representing class i, a _j Representing a prototype of category j. y is _i Representing the ith component in the y vector, i.e., x, under the conditions of a known prototype, a known semantic embedding network, and a visual feature embedding network _n Probability of belonging to class i;

wherein the function f _ρ The calculation is as follows:

Step 2.5: minimizing the following objective function ρ ₂ ＞ρ ₁ ；

whileIs x _n The p-function is calculated as follows:

the unseen category sample identification includes:

and (3) outputting: predictive output of the test image;

step 2: category prototype A of a visible category _C And class prototype A of unknown class _U Projection into sphere-embedded space, i.e. for A _C UA _U Each category prototype a of (1), embedding the network according to semanticsGenerate->

learning to obtain a semantic embedded network by utilizing an objective function constructed in the whole framework of the systemParameters, thus realizing the fusion of sphere embedding, sphere alignment and sphere calibration into one frame, and solving the problems of semantic gap, pivot and prediction deviation.

In the semantic embedded network parameter learning process, step 2 is that of step 2.3In the calculation formula of (1), lambda epsilon [0,1 ]]Is a super-parameter regulated in the test process; step 2 +.4 of step 2>In the calculation formula of (2), H is the entropy of probability distribution q of training set samples; step 2 +.4 of step 2>In the calculation formula of (2), C is the number of samples in visible categories, and U is the number of samples in invisible categories; phi (x) _n ) For image x _n Is a visual feature embedding function,/>The function is embedded for the semantic features of category prototype a.

Semantic inlayIn the network parameter learning process, step 2 is that of step 2.5In the formula, the alpha and beta experimental data are adjusted to be super-parameters.

The first term in the equationThe spherical embedding is embodied; the second term in the formula αR (η) ^* ) The spherical alignment is embodied; third term in the formula->Reflecting spherical calibration; />The semantic gap, the pivot problem and the prediction deviation problem are respectively solved by centralizing sphere embedding, sphere alignment and sphere calibration as an optimization formula.

To sum up: the joint objective function proposed in the present application:wherein alpha and beta are super-parameters regulated in the experimental process. The first term in the above equation represents sphere embedding, the second term represents sphere alignment, and the third term represents sphere alignment. The application uses sphere embedding, sphere alignment and sphere calibration in a optimization formula in a centralized way to respectively solve the problems of semantic gap, pivot and prediction deviation;

the application maps the distance between visual characteristics of the image and semantic descriptions of the category to a sphere calculation, a specific formulaThe traditional Euclidean distance ignores angle information, and the cosine distance does not show radial distance at all, so the spherical surface embedding adopted by the application takes angle information and radial distance into more comprehensive consideration. Still further, the present application is directed to visible categories and undiscoveredThe category, using different radial distances (formula below), aggravates the effect of not seeing the category sample.

Wherein ρ is ₂ ＞ρ ₁ 。

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims

1. A zero sample learning mechanism based on sphere embedding, sphere alignment and sphere calibration, comprising: constructing a whole framework of the system, and identifying a semantic embedded network parameter learning process and an unseen category sample;

the system-wide framework construction includes:

wherein the method comprises the steps ofThe KL distance of sphere embedding is represented, the KL distance distributed between an actual marked sample and a predicted sample is calculated, and R represents a sphere alignment function; />Representing the minimum entropy constraint of spherical calibration, wherein alpha and beta are super parameters regulated in the experimental process;

and (3) outputting: semantic embedding networkParameters;

step 2: the iteration number iter= [1:l ] is as follows:

step 2.1: randomly sampling B samples;

Step 2.3: r is calculated according to the following formula:

wherein i and j respectively represent class labels corresponding to prototype a, cos (θ _i,j ) The cosine distance between the prototypes of the two classes i, j is represented,representing the alignment factor between two categories i, j:

wherein a is _i Prototype representing class i, a _j A prototype of category j is represented,representing a semantic embedded network;

wherein u represents the uniform alignment factor, i.eRepresenting balanced alignment factors, the value of S representing semantic alignment factors, i.eRepresents a similarity alignment factor, while λ is a balance parameter that balances semantic alignment and uniform alignment, in [0,1]The value of the two values is taken between the two values,

the balance alignment factor is calculated as follows:

the similarity pair Ji Yinzi is calculated as follows:

step 2.4: calculated according to the following formula:

wherein the method comprises the steps ofRepresenting training dataset->Representing selection of samples x from training dataset _n ，/>Expressed in dataset +.>Expected on all samples, H [ q ]]Entropy representing distribution q>Representing sample x _n Through a visual characteristic embedding network phi and a semantic embedding network +.>Then, under the condition of knowing the prototype a, predicting the probability of the class label yDistribution, a is the corresponding prototype, y is the predicted class label, and probability distribution q is calculated as:

wherein phi represents the embedding of the visual features into the network,representing a semantic embedded network, C representing a known category number, U representing an unknown category number, a _i Prototype representing class i, a _j Prototype, y representing class j _i Representing the ith component in the y vector, i.e., x, under the conditions of a known prototype, a known semantic embedding network, and a visual feature embedding network _n The probability of belonging to the class i is,

wherein the function f _ρ The calculation is as follows:

wherein ρ is ₁ And ρ ₂ Spherical radius functions corresponding to visible and invisible categories, respectively, specify ρ ₂ >ρ ₁ ；

Step 2.5: minimizing the following objective function ρ ₂ >ρ ₁ ；

Wherein the method comprises the steps ofRepresenting training dataset->Representing selection of samples x from training dataset _n ，/>Expressed in dataset +.>Expectations on all samples, D _KL Representing the calculation x _n Actual category label->And the KL distance between its predicted tag distribution p,

whileIs x _n The p-function is calculated as follows:

the unseen category sample identification includes:

and (3) outputting: predictive output of the test image;

step 1: measurement by contrastTest image x _m Calculating a visual representation of the test image;

wherein θ is _n,i Representation of prototype a _n And a _i An included angle between them, where n represents a sample x _n Is a category of (2).

2. The zero-sample learning mechanism based on sphere embedding, sphere alignment and sphere calibration according to claim 1, wherein the semantic embedded network is learned by using objective functions constructed in the system overall frameworkParameters, thus realizing the fusion of sphere embedding, sphere alignment and sphere calibration into one frame, and solving the problems of semantic gap, pivot and prediction deviation.

3. The zero-sample learning mechanism based on sphere embedding, sphere alignment and sphere calibration according to claim 1, wherein in the semantic embedding network parameter learning process, step 2 is the step 2.3 of step 2In the calculation formula of (1), lambda epsilon [0,1 ]]Is a super-parameter regulated in the test process.

4. The zero-sample learning mechanism based on sphere embedding, sphere alignment and sphere calibration according to claim 1, wherein in the semantic embedding network parameter learning process, step 2 is step 2.4 of step 2In the calculation formula of (2), H is the entropy of probability distribution q of training set samples.

5. The zero-sample learning mechanism based on sphere embedding, sphere alignment and sphere calibration according to claim 1, wherein in the semantic embedding network parameter learning process, step 2 is step 2.4 of step 2In the calculation formula of (2), C is the number of samples in visible categories, and U is the number of samples in invisible categories; phi (x) _n ) For image x _n Is a visual feature embedding function,/>Embedding a function, y, for semantic features of class prototype a _i Representing the ith component in the y vector, i.e., x, under the conditions of a known prototype, a known semantic embedding network, and a visual feature embedding network _n Probability of belonging to class i.

6. A zero-sample based on sphere embedding, sphere alignment and sphere calibration as claimed in claim 1The learning mechanism is characterized in that in the process of learning the semantic embedded network parameters, the step 2 is that of step 2.5In the formula, the alpha and beta experimental data are adjusted to be super-parameters.

7. The zero sample learning mechanism based on sphere embedding, sphere alignment and sphere calibration of claim 6, wherein the first term in the formulaThe spherical embedding is embodied.

8. The zero sample learning mechanism based on sphere embedding, sphere alignment and sphere calibration of claim 6, wherein the second term in the formula αr (η ^* ) Spherical alignment is embodied.

9. The zero sample learning mechanism based on sphere embedding, sphere alignment and sphere calibration of claim 6, wherein the third term in the formulaAnd spherical calibration is embodied.

10. A zero sample learning mechanism based on sphere embedding, sphere alignment and sphere calibration as defined in claim 6, wherein,the semantic gap, the pivot problem and the prediction deviation problem are respectively solved by centralizing sphere embedding, sphere alignment and sphere calibration as an optimization formula.