CN112529772A

CN112529772A - Unsupervised image conversion method under zero sample setting

Info

Publication number: CN112529772A
Application number: CN202011501620.5A
Authority: CN
Inventors: 陈元祺; 余晓铭; 刘杉; 李革
Original assignee: Instritute Of Intelligent Video Audio Technology Longgang Shenzhen
Current assignee: Instritute Of Intelligent Video Audio Technology Longgang Shenzhen
Priority date: 2020-12-18
Filing date: 2020-12-18
Publication date: 2021-03-19
Anticipated expiration: 2040-12-18

Abstract

An unsupervised image conversion method under zero sample setting comprises the steps of applying attribute-visual relevance constraint and extending an attribute space by using an unseen attribute, wherein the application of the attribute-visual relevance constraint and the extension of the attribute space by using the unseen attribute are carried out synchronously. By applying attribute-visual relevance constraint and expanding the attribute space by using unseen attributes, the model can be promoted to be capable of fully utilizing the attribute characteristics of the category, so that unsupervised image conversion under a zero sample is realized.

Description

Unsupervised image conversion method under zero sample setting

Technical Field

The invention relates to the field of image generation and image conversion, in particular to an unsupervised image conversion method under zero sample setting.

Background

In recent years, with the development of creating an antagonistic network, a creation model has been receiving more and more attention. On the one hand, the generative model based on the generative antagonistic network shows a surprisingly generative effect, and the generative images have high resolution and are visual enough to be in a false-spurious mode; on the other hand, as the famous physicist ferman says, "i really understand it only if i can create it". Although recent machine learning models are excellent in tasks such as image classification, success of these applications cannot show that we really understand images and really realize intelligence. The ability to generate images is significant for further understanding of the images.

The image-to-image transformation (image-to-image transformation) is a branch in the generative model, which is subordinate to the conditional generative model and is input conditional on the input image. It deals with how images are transformed from one domain to the corresponding image in another domain. For example, an image taken in the daytime is converted into a night scene while the scene is kept unchanged. This is a challenging task, first, the output of the model should have both authenticity and characteristics of the target domain, pertaining to the target domain to which it is transformed; secondly, the model should have the output retain the individual characteristics of the input and should not have the complete other picture obtained after conversion. The problem in the second point is also called mode collapse (mode collapse), i.e. the output collapses into a few modes, the network outputting the same single result even if different inputs are provided to the network.

The above problem can be solved well in case of supervision. When paired datasets are owned (as a daytime image and a nighttime image of the same scene), the true images corresponding to the images can be approximated after transitioning from the source domain to the target domain by constraining the images. However, in many scenarios in reality, paired samples often cannot be obtained at low cost, or even do not exist. In this case, how to train the image transformation model unsupervised is a difficulty. Furthermore, the pattern collapse problem of image transitions is particularly acute when some classes of samples are not sufficient in number, or even completely without samples. In summary, at a zero sample setting, unsupervised image transformation is a challenging problem.

Disclosure of Invention

The invention provides an unsupervised image conversion method under zero sample setting, which realizes unsupervised image conversion under zero sample.

The technical scheme of the invention is as follows:

the unsupervised image conversion method under the zero sample setting comprises the steps of applying attribute-visual relevance constraint and extending an attribute space by using an unseen attribute, wherein the application of the attribute-visual relevance constraint and the extension of the attribute space by using the unseen attribute are synchronously carried out.

Preferably, in the above unsupervised image conversion method under zero sample setting, the applying of the attribute-visual association constraint includes the steps of: sampling two seen class attributes a from attribute space_mAnd a_nAnd calculating the correlation between the two

According to the adaptive instance normalization (AdaIN) method of style migration, the category attribute a seen by two is calculated_mAnd a_nVisual feature w of a determined visual space_m、w_nAnd calculating the correlation between the two

And applying a relevance constraint: for two seen category attributes a_mAnd a_nAnd the two seen category attributes a_mAnd a_nDetermined visual feature w_m、w_nIs used for applying a constraint regular term L_reg＝||s(a_m，a_n)-s(w_m，w_n)||₂。

Preferably, in the unsupervised image conversion method under the zero sample setting, the extending of the attribute space by the unseen attribute includes the following steps: sample not found class attribute a_uAnd an input image x_iGenerating an image x with a generator_t: passing loss function

Constraining image x_tMake it have unseen category attribute a_uThe features of (1); and performing attribute regression by using the discriminator to expand the attribute space.

According to the technical scheme of the invention, the beneficial effects are as follows:

the method of the invention promotes the model to fully utilize the attribute characteristics of the category by applying the attribute-visual relevance constraint and utilizing the unseen attribute to expand the attribute space, thereby realizing the unsupervised image conversion under the zero sample.

For a better understanding and appreciation of the concepts, principles of operation, and effects of the invention, reference will now be made in detail to the following description of the invention, taken in conjunction with the accompanying drawings, by way of specific embodiments thereof.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention.

Fig. 1 is an overall framework diagram of the unsupervised image conversion method under the zero sample setting of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without any creative effort, shall fall within the protection scope of the present invention.

The image conversion model related to the unsupervised image conversion method under the zero sample setting of the invention is based on a generation countermeasure network and comprises a generator and a discriminator (shown in figure 1). Training the process of generating a countermeasure network is a very small game, wherein the goal of the generator is to produce enough samples to confuse the discriminators with false or false positives; the discriminator then attempts to distinguish between samples from the true data distribution and the generated samples. Training to the stable phase, the generator will be able to have higher quality samples, and it is also difficult for the discriminator to distinguish them from true samples.

In the case of zero samples, there is a portion of the class missing image sample data, referred to as the unseen class. For unseen categories, the only insight is the attribute features that they each hold. The image conversion model under the zero sample aims to input an image to be converted and a category attribute and convert the image into a target category. The key to the problem is how to migrate the knowledge of the seen categories to the unseen categories and how to cause the model to be transformed with the category attributes.

The working principle of the invention is as follows: by applying attribute-visual relevance constraint and expanding the attribute space by using unseen attributes, the model can be promoted to be capable of fully utilizing the attribute characteristics of the category, so that unsupervised image conversion under a zero sample is realized.

Attribute-visual association constraints refer to the association maintained for an attribute pair in attribute space and the association of a converted image pair according to the attribute pair, both of which are constrained to be consistent. Since there are no image samples in the training phase for unseen classes, it is desirable to utilize the attribute vectors to provide effective guidance for image transformation. By introducing the attribute-visual relevance constraint, the structure of the attribute space in the learned visual space is guided, and the image conversion of the unseen category is promoted.

Extending the attribute space with unseen attributes refers to applying the attributes of unseen classes in the training process. For unseen classes, although their image samples cannot be obtained, corresponding class attributes can be obtained. In the training phase, the unseen attributes are substituted into the image conversion model along with the input image and enable capture of the unseen attributes from the converted image. The strategy can better avoid mapping bias in zero sample image conversion, namely a conversion model for an unseen class is biased to convert to a similar seen class, thereby reducing the conversion performance gap between the seen class and the unseen class.

Fig. 1 is a general framework diagram of the unsupervised image conversion method under the zero sample setting of the present invention, and as shown in the figure, the method of the present invention includes two strategies, namely strategy 1: apply attribute-visual association constraints, and policy 2: the attribute space is extended with unseen attributes, where both strategies are synchronized.

Wherein applying the attribute-visual association constraint comprises the steps of:

1) sampling two seen class attributes a from attribute space_mAnd a_n(i.e., attribute 1 and attribute 2 in the attribute space of FIG. 1), and calculates the correlation between the two

2) According to the adaptive instance normalization (AdaIN) method of style migration, the category attribute a seen by two is calculated_m、a_nVisual feature w of a determined visual space_m、w_n(corresponding to image 1 and image 2 in the visual space of fig. 1), and calculates the correlation between them

And

3) applying a relevance constraint: for two seen category attributes a_mAnd a_nAnd the two seen category attributes a_m、a_nDetermined visual feature w_m、w_nIs used for applying a constraint regular term L_reg＝||s(a_m，a_n)-s(w_m，w_n)||₂。

The method for expanding the attribute space by using the unseen attribute comprises the following steps:

1) sample not found class attribute a_uAnd an input image x_iGenerating an image x with a generator_t；

2) Passing loss function

Constrained generation image x_tMake it have unseen category attribute a_uIs characterized by(ii) a And

3) and performing attribute regression by using a discriminator to expand an attribute space.

Compared with the existing image conversion method, the method provided by the invention has better conversion accuracy and better generation quality. Two concepts of conversion accuracy and generation quality in image conversion and related evaluation indexes are explained below, respectively.

Conversion accuracy: a measure is given to whether the image after conversion belongs to the domain to be converted. Generally, a pre-trained classifier is used for judging the probability that the converted image belongs to a target domain, and the evaluation indexes comprise Top-1 classification accuracy and Top-5 classification accuracy, namely for one picture, if the previous (or the previous five) of the probabilities contains correct answers, the correct accuracy is considered.

The quality of generation: and judging whether the converted image has higher image quality. On the evaluation index, the evaluation is divided into objective evaluation and subjective evaluation. Fraich perceptual distance (FID) is a commonly used objective method of generating an estimate of quality. To calculate the FID of an image conversion model, a batch of converted images is first generated using the model and sampled from the data set for comparison. Then, the characteristics of the two batches of images are extracted, the statistical characteristics of the two batches of images are calculated, and the difference of distribution between the generated image and the real image is measured based on the statistical characteristics to serve as the evaluation of the quality of the generated image. For subjective evaluation, the conversion results of several models are often presented to the testee at the same time, and the testee is allowed to select an image with the highest quality. After a large number of tests are performed, the models with higher selection rates have higher production quality.

As shown in Table 1, the comparison of the results of the present invention with the objective indicators of other algorithms includes the conversion accuracy for the seen and unseen classes and the generated quality indicator FID. Compared with the existing model (in which FUNIT-1 and FUNIT-5 are not at zero sample setting, but are unfair contrast, and StarGAN is at zero sample setting), the method provided by the invention can obtain better effects on both CUB and FLO data sets, wherein the improvement is more remarkable for unseen classes.

TABLE 1 comparison of the results of the present invention with objective indices of other algorithms

As shown in table 2, for subjective evaluation, when the conversion results of several models were presented to the subject at the same time, the picking rate of the present invention was much higher than it compared to StarGAN also at the zero sample setting; the present invention also exhibits competitive results for FUNIT-1 and FUNIT-5 in a low sample setting.

Table 2 shows the results of the present invention compared to subjective indicators of other algorithms.

Model (model)	CUB data set	FLO dataset
			FUNIT-1	27.8％	21.8％
FUNIT-5	34.2％	27.8％
			StarGAN	7.8％	14.3％
The invention	30.2％	36.1％

The present invention is not limited to the above preferred embodiments, and any modifications, equivalent replacements, improvements, etc. within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An unsupervised image conversion method under zero sample setting is characterized by comprising the steps of applying attribute-visual relevance constraint and extending an attribute space by using an unseen attribute, wherein the application of the attribute-visual relevance constraint and the extension of the attribute space by using the unseen attribute are carried out synchronously.

2. The method of unsupervised image transformation at zero sample setting of claim 1, wherein said applying an attribute-visual association constraint comprises the steps of:

sampling two self-seeing class attributes a from attribute space_mAnd a_nAnd calculating the correlation between the two

Calculating the class attribute a seen by the two according to an adaptive instance normalization (AdaIN) method of style migration_mAnd a_nVisual feature w of a determined visual space_m、w_nAnd calculating the correlation between the two

And

applying a relevance constraint: for the two seen category attributes a_mAnd a_nAnd the category attribute a seen by the two_mAnd a_nDetermined visual feature w_m、w_nIs used for applying a constraint regular term L_reg＝||s(a_m，a_n)-s(w_m，w_n)||₂。

3. The unsupervised image conversion method at zero sample setting according to claim 1, wherein the extending the attribute space with unseen attributes comprises the steps of:

sample not found class attribute a_uAnd an input image x_iGenerating an image x with a generator_t；

Passing loss function

Constraining the image x_tMake it have the unseen category attribute a_uThe features of (1); and

and performing attribute regression by using a discriminator to expand an attribute space.