CN113052230A

CN113052230A - Clothing image generation system and method based on disentanglement network

Info

Publication number: CN113052230A
Application number: CN202110304774.3A
Authority: CN
Inventors: 张建明; 宋阳; 王志坚
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2021-03-22
Filing date: 2021-03-22
Publication date: 2021-06-29

Abstract

The invention relates to the field of computer vision, in particular to a clothing image generation system and a clothing image generation method based on an entanglement removal network, wherein the method comprises the following steps: step S101, acquiring a plurality of clothing images with category labels; step S102, acquiring a color label of the clothing image, and cascading the color label with a category label; step S103, training a de-entanglement neural network, and initializing a generator and a discriminator network parameter of the de-entanglement neural network; step S104, distinguishing the real image and the clothing image generated by the disentanglement generator; and step S105, adjusting and optimizing the parameters of the disentanglement network according to the belonged judgment value and the output image. The invention combines the artificial intelligence technology with the traditional clothing industry, overcomes the problems of single design, insufficient user satisfaction, high design cost and the like in the traditional clothing industry, improves the clothing design efficiency, and ensures that the designed clothing fully meets the design requirements of users.

Description

Clothing image generation system and method based on disentanglement network

Technical Field

The invention relates to the field of computer vision, in particular to a clothing image generation system and method based on an entanglement removal network.

Background

The computer vision technology has very many specific applications in the fields of image generation, conversion, restoration and the like, and the conditional countermeasure generation network can generate a garment image which cannot be distinguished from a real image according to an input label, so that inspiration is provided for garment designers and consumers; the key to the generation of the garment image is diversified style design, which requires the network to be able to well disentangle the features of the garment image, such as the color features and shape features of the garment; meanwhile, a user can select the type or color style of the garment to be generated, the customized design problem of the garment can be solved to a certain extent, intelligent design of the garment with the user as the center is realized, design inspiration is given according to the user requirement, all the generated garment images are generated from scratch according to the real image distribution, conditional random generation can ensure the diversity of the generated garment images, customized generation of the garment can be realized according to the user requirement, meanwhile, the design inspiration can be provided for a garment designer, the combination of the garment industry and the artificial intelligence technology is facilitated, and the development of the garment industry is promoted.

Of course, there are many difficulties in designing garment images using artificial intelligence techniques. The difficulty is mainly shown in that:

1) the training difficulty of the generated countermeasure network is high and convergence is difficult;

2) generating randomness of the image;

3) the clothing image has more characteristics, complex textures and patterns, and is easy to cause characteristic intersection and image blurring;

4) high resolution, high definition images are not well generated.

Disclosure of Invention

In order to solve the defects of the prior art, the invention provides a clothing image generation method based on an entanglement removal network, which generates clothing images of various styles according to user conditions and provides clothing design inspiration for consumers and designers, and the specific technical scheme is as follows:

a clothing image generation method based on an disentanglement neural network comprises the following steps:

s101, acquiring a plurality of clothing images with category labels;

s102, acquiring a color label of the clothing image, and cascading the color label with a clothing type label;

s103, training a de-entanglement neural network, and initializing a de-entanglement generator parameter and a discriminator network parameter of the de-entanglement neural network;

s104, inputting the cascaded labels into the de-entanglement neural network, and judging a real image and a clothing image generated by the de-entanglement generator;

and S105, adjusting and optimizing the disentanglement network parameters according to the judgment value and the output clothing image.

Further, the class label and the color label of the clothing image are obtained by a single hot coding method, wherein the classification of the clothing image colors adopts an OpenCV tool to convert an RGB model of the clothing image into an HSV model.

Further, step S103 specifically includes:

s103_1, training a de-entanglement neural network, wherein the de-entanglement neural network is a conditional countermeasure network and comprises a de-entanglement generator G and a multi-stage discriminator D, the de-entanglement generator generates a picture after extracting the style features of the clothing image, the multi-stage discriminator discriminates the real picture from the generated picture, and the input of the de-entanglement neural network is a category label l of the clothing image_classColor label l_colorAnd a random noise variable z, the disentangle generator output G (z, (l)_class，l_color) Log (G (z, (l)) is output from the multi-stage discriminator_class，l_color) )) and log (I)_real) Corresponding to the discrimination results of the generated picture and the real picture by the multi-stage discriminator, respectively, wherein I_realIs the concatenation of the real garment image and its label;

the overall objective function during training is:

i.e., the overall GAN loss function is:

among them, the above-mentioned materials are used,

respectively, subject to the discrimination result expectation of the true distribution and the de-entanglement generator generated distribution,

training processes of the minimum de-entanglement generator of the discriminator to generate a distribution discrimination expectation and a maximum true distribution discrimination expectation, respectively,/_true、x_trueL respectively representing a label of the real clothing image, the real clothing image and a label for generating the clothing image;

s103_2, performing spectrum normalization on all network layers of the de-entanglement generator and the multi-stage discriminator, wherein the weight initialization of all network layers is subjected to Gaussian distribution, the mean value is 0, and the variance is 1.

Further, the multi-stage discriminator is composed of a local discriminator and a global discriminator, the local discriminator and the global discriminator respectively perform down-sampling discrimination on the real picture and the generated picture on two different scales, and finally the results of sampling are combined to obtain the discrimination results of the multi-stage discriminator.

Further, the intermediate feature output of the generated image of the multi-stage discriminator is matched with the intermediate feature output of the real image, and the feature matching loss function is as follows:

where T is the total number of network layers, N_iIndicating the number of elements in each layer network, D_kA representative sub-discriminator, as a feature extractor, that minimizes feature matching loss only when training the de-entanglement generator

Further, the de-entanglement generator is composed of a mapping network omega and a progressive generation network G_progressThe mapping network omega is composed of a fully-connected network layer, random noise and label codes are input into an intermediate latent space, and an intermediate latent code psi (psi) is output_style，ψ_bias) Controlling parameters of an adaptive instance normalization layer, the normalization function being:

wherein each feature map x_iRespectively carrying out normalization operation, scaling the feature map by the intermediate latent code psi_styleDoubling and adding offset psi_bias(ii) a Progressive generation network G_progressThe convolution module is provided with a self-adaptive example normalization layer, the convolution module performs up-sampling by adopting a linear interpolation method, the amplification factor is 2, the input is intermediate latent codes and random Gaussian noise, the added noise follows Gaussian distribution, the mean value is 0, the variance is 1, and finally, the output is converted into an RGB image through a convolution layer with the convolution kernel of 1.

Further, the mapping network maps the input signal into a potential spatial variable ω, and then the mapping transformation transforms the potential spatial variable into a pattern variable y ═ y (y)_s,y_b) Output to progressive generation network G_progress，y_s、y_bScaling the multiple and the deviation respectively, after each convolution module, the style variable controls a parameter of the adaptive instance normalization, and the operation of the adaptive instance normalization parameter is represented as:

wherein each feature map x_iIs regularized separately and then normalized with the feature pattern variable y, so the dimension of y is twice the number of the network layer feature maps.

Further, the step S104 specifically includes: the real image, the category label and the color style label of the real image are sent to a multi-stage discriminator to judge whether the real image is true or false, the real image and the matched label are judged as true when the multi-stage discriminator starts training, and the clothing image generated by the disentanglement generator is judged as false.

Further, step S105 specifically includes: in each training, parameters are optimized according to a judgment value and an objective function, specifically, features of generated images are extracted through a VGG network, and a perception loss function is added in the training process:

F⁽ⁱ⁾indicating that the VGG network contains M_iThe i-th network of each element has the objective function as follows:

wherein λ is₁And λ₂Respectively, the hyper-parameters which need to be adjusted during training are adopted, after each iteration of the network, the parameters of the de-entanglement generator and the multi-stage discriminator are updated by using a gradient descent method, and the generator parameter theta is updated for 5 times_GUpdating the discriminator parameter once

And the error is propagated backwards.

A clothing image generation system based on a de-entangled neural network, comprising:

a user registration module: confirming the identity information of the user, and recording the clothes type and the color style preferred by the user;

an input conversion module: receiving a clothing design requirement input by a user, and converting the clothing design requirement into an input corresponding to a model;

a clothing image design generation module: the converted clothing design requirements of the user are sent into a trained model, a network receiving noise signal and condition input are generated through disentanglement, and a corresponding clothing image is output;

a display module: the system displays the clothing image output by the disentanglement network to the user, and the user can select the favorite and satisfactory image.

The method adopts the conditional unwrapping to generate the network, can well reserve elements such as textures and patterns of the clothes, can realize customized design according to user requirements, and improves the randomness and the specifity of user experience; the generation of the countermeasure network is adopted, and the generation is promoted together, so that the quality and the effect of the generated image can be continuously improved; after the algorithm is trained, the system can efficiently and massively generate customized clothing images and carry out clothing design according to user conditions, and the clothing type and the color style required by a user can be well generated.

Drawings

FIG. 1 is a schematic block diagram of the process flow of the present invention;

FIG. 2 is a flow schematic block diagram of the system architecture of the present invention;

FIG. 3 is a schematic representation of a sample garment image used in the training of the method of the present invention;

FIG. 4 is a schematic diagram of the overall structure of a disentanglement network used in the training of the method of the present invention;

FIG. 5 is a schematic diagram of a multi-stage discriminator network in a de-entanglement network used in the training of the method of the present invention;

fig. 6 is a diagram of the effects actually produced by the system of the present invention.

Detailed Description

In order to make the objects, technical solutions and technical effects of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings.

A clothing image generation system and method based on disentanglement network, mainly for assisting clothing designer, provide the clothing image design for consumer, the method is:

the system receives the clothing generation requirement of the user in advance, converts the requirement into a label vector L, sends the label vector L into a pre-trained neural network, and generates and displays clothing images according to the input conditions of the user.

The neural network is a conditional generation countermeasure network and mainly comprises two parts: 1) the de-entanglement garment image de-entanglement generator is used for generating a model to capture the image distribution of the original data set and generating a garment image which is continuously close to the original data set according to the input condition; 2) the multi-stage discriminator judges whether the input is true or false, judges whether the input is true for a real garment image with a specific category attribute, and judges whether the input is false for a garment image generated by the generated model; and the generated model continuously improves the model parameters according to the judgment result, and finally generates the clothing image with the image quality and the class attribute meeting the requirements of the user.

More specifically, as shown in fig. 1, a clothing image generation method based on a disentanglement network includes the following steps:

s101, acquiring a plurality of clothing images with category labels.

The clothing images are various clothing images with white backgrounds and sizes of 512 x 512, and the clothing categories can be suits, shirts, one-piece dresses, sweaters and the like. Each garment has a specific garment type, a one-hot coding method is adopted, for example, 1 represents that the garment is a sweater, 4 represents that the garment is a suit, a specific code is respectively set for each type and is converted into a label vector, the specific position is 1, the rest positions are 0, and the code is used as the input of a condition generation network.

S102, acquiring a color label of the clothing image, and cascading with the clothing category label.

The main color styles of the garments, including: white, black, green etc. for every clothing is beaten the color label, specifically is: the main color label of each clothing image can be obtained by an OpenCV tool, the color of each clothing image is independently coded, for example, the red code is 1, the green code is 5, and then the clothing image is converted into the corresponding color label by a method of independent thermal coding; then, the garment image is cascaded with the class label of each garment image to be used as a condition for generating a network by conditional countermeasure, and the generation result is controlled.

The method comprises the steps that an OpenCV tool converts an RGB model of a clothing image into an HSV model, wherein in the HSV model, H belongs to [0,180 ], S belongs to [0,255], V belongs to [0,255], and hue, saturation and brightness of the image are respectively represented.

S103, training a de-entanglement neural network, and initializing a de-entanglement generator parameter and a discriminator network parameter of the de-entanglement neural network.

Training data required by a pre-trained de-entanglement neural network of the system are original garment images, and preprocessing and classified encoding are carried out on each original garment image;

in the training stage, a clothing data set is trained through a de-entanglement neural network consisting of a de-entanglement generator and a multi-stage discriminator, and the input of the de-entanglement neural network is a class label l of a clothing image_classColor label l_colorAnd a random noise variable z, the disentangle generator output G (z, (l)_class，l_color) Log (G (z, (l)) is output from the multi-stage discriminator_class，l_color) )) and log (I)_real)，I_realThe method is characterized in that a real garment image and a label thereof are cascaded and respectively correspond to a multi-stage discriminator to discriminate a generated picture and a real picture;

the overall objective function during training is:

i.e., the global GAN loss function is:

among them, the above-mentioned materials are used,

training processes of the minimum de-entanglement generator of the discriminator to generate a distribution discrimination expectation and a maximum true distribution discrimination expectation, respectively,/_true、x_trueAnd l respectively represent a label of the real clothing image, the real clothing image and a label for generating the clothing image.

In order to improve the effect of the GAN loss function, the multi-stage discriminator adopts an intermediate feature matching process, and can output intermediate features, the intermediate feature output of the generated image of the multi-stage discriminator needs to be matched with the intermediate feature output of the real image, and the feature matching loss function is as follows:

where T is the total number of network layers, N_iIndicating the number of elements in each layer network, D_kRepresenting two sub-discriminators, as feature extractors, only performing a minimization of feature matching loss when training the de-entanglement generator

The de-entanglement generator can extract the class characteristics and the style characteristics of the clothing image, de-entangle the characteristics of the clothing patterns and the patterns, and generate a network G mainly through a mapping network omega and a progressive generation network G_progressComposition in which the network G is progressively generated_progressIs a convolution module with an adaptive instance normalization module.

Random noise z and label code L as input of mapping network omega, intermediate latent code and adaptive random noise as progressive generation network G_progressFor an image with an image resolution of 512 × 512, G_progress16, and the final layer of convolution converts the result to an RGB image using a 1 x 1 convolution kernel. Mapping network omega output intermediate latent code psi ═ phi (psi)_style，ψ_bias) Controlling parameters of an adaptive instance normalization module, the normalization function being:

wherein each feature map x_iRespectively carrying out normalization operation, scaling the feature map by the intermediate latent code psi_styleDoubling and adding offset psi_biasThus, the dimension of the intermediate potential ψ is twice that of the feature map x.

The mapping network omega is composed of 6 layers of fully-connected network layers, the input and output sizes are 512 multiplied by 512, the input of the mapping network omega is constant and label one-hot coding, the input of the mapping network omega is mapped to an intermediate latent space, then a mapping variable passes through a convolution module with an adaptive instance normalization module, each convolution module can be regarded as an up-sampling module, the progressive generation network is composed of the up-sampling modules, up-sampling is carried out by adopting a linear interpolation method, and the amplification factor is 2; the convolution kernel size of the convolution layer is 3 multiplied by 3, along with the continuous reduction of the number of the network stacking channels, the convolution layer is followed by a ReLU nonlinear transformation layer and a batch normalization layer, Gaussian noise is added at the end of each layer to increase the randomness of the generated image, the input is intermediate latent features and random Gaussian noise, the random Gaussian noise obeys Gaussian distribution, the mean value of the random Gaussian noise is 0, and the variance is 1; and finally, converting the output into RGB data through a convolution layer with convolution kernel of 1, namely a final output image.

The convolution module contains self-adapting example normalization module, and uses the characteristic pattern y ═ y (y) output by mapping network_s，y_b) For example normalization, the mapping network maps the input signal into a latent space variable ω, which is then converted into a pattern variable y ═ y (y)_s,y_b)，，y_s、y_bScaling the multiples and deviations separately, after each convolution module, the pattern variable controls the parameters of the adaptive instance normalization, the operation of which can be expressed as:

wherein each feature map x_iThe method is independently regularized and then normalized with a pattern variable y, so that the dimension of y is twice of the number of the network layer feature maps, and finally, the randomness of a generated result is increased by directly adding a noise signal.

The multi-stage discriminator is a down-sampling network with a local discriminator and a global discriminator, for an image with the image size of 512 x 512, the local discriminator performs down-sampling on the features with the size of 256 x 256, the global discriminator performs down-sampling on the whole image, and the final sampling results are combined to obtain the discrimination results of the multi-stage discriminator. In order to ensure the stability of the GAN network during training, all network layers of the de-entanglement generator and the multi-stage discriminator are subjected to spectrum normalization, the weight initialization of all network layers is subjected to Gaussian distribution, the mean value is 0, and the variance is 1.

And S104, inputting the real image and the clothing image generated by the disentanglement generator into the discriminator for judgment.

The main objective of the disentanglement generator is to generate a false image that is "indistinguishable" from the true image, and the objective of the discriminator is to determine the true image as true and the false image as false.

Specifically, the real image, the category label and the color style label of the real image are sent to a multi-stage discriminator to judge whether the image is true or false, the multi-stage discriminator judges the real image and the matched label to be true, namely 1 when training is started, the clothing image generated by the detangling generator is judged to be false, namely 0, the detangling generator continuously improves the quality of the generated clothing image to hopefully cheat the discriminator, the discriminator also continuously improves the discrimination level to accurately distinguish the real image from the false image, the real image from the false image and the false image are mutually confronted, and finally the image generated by the detangling generator can be confused.

And S105, adjusting and optimizing the parameters of the de-entanglement network according to the judging value and the output image. And alternately iterating the disentanglement generator and the multi-stage discriminators, wherein the global objective function during training is represented as:

an alternating iterative process is shown in which the discriminator minimizes the distributed discrimination values of the image generated by the detangler generator and maximizes the true image discrimination expectation.

In each training round, parameters are optimized according to a judgment value and an objective function, in order to improve the quality of image generation, a VGG network is used for extracting the characteristics of generated images, and a perception loss function is added in the training process:

λ₁and λ₂Respectively, the super-parameters which need to be adjusted during training are used for updating parameters of the de-entanglement generator and the discriminator by using a gradient descent method.

To speed up the training process, the overall objective function is optimized by updating the disentanglement generator parameter θ every 5 times_GUpdating the discriminator parameter once

And the error is propagated reversely; respectively updating a de-entanglement generator and a discriminator by using a gradient descent algorithm to reduce a loss function, respectively setting initial learning rates of the de-entanglement generator and the discriminator to be 0.001 and 0.004 by adopting an Adam optimizer, wherein the total number of training rounds is 20000, the learning rate is unchanged during the front 10000 rounds of training, the learning rate is attenuated to 0 in a linear mode during the rear 10000 rounds of training, and the optimization parameter beta is optimized₁＝0，β₂The weights are initialized with a gaussian distribution, mean 0 and variance 0.01, 0.999.

As shown in fig. 2, the system using the clothing image generation method based on the disentanglement network includes:

By processing images of different data sets, experimental results for verifying the scheme provided by the embodiment of the application are obtained, and the data sets and the experimental results are introduced as follows:

the data set used by the invention is a data set provided in an Appeal Classification With Style (ACWS) article, the data set comprises more than 80000 pictures and 15 garment types, the pictures of the data set are unified into 512 x 512 size by methods of zooming, stretching, bilinear interpolation and the like, the label and the garment image of each garment type are simultaneously sent to a multi-stage discriminator for judgment, after iterative training 20000 rounds, the experimental result is tested, as shown in figure 6, the trained model forms an algorithm module of a garment image generation system, the user inputs different types of garment type labels and color Style labels, and the system can generate the garment image specified by the user.

In order to demonstrate the effectiveness of the invention in feature detangling, the invention also performed ablation experiments to demonstrate: the disentanglement generator and the multi-stage discriminators have a very great effect on improving the generation effect of the clothing image, the perception Score (IS) IS an objective evaluation index commonly used for generating the model, the higher the Score IS, the better the effect of the generated model IS, the perception Score IS used for evaluating the result of the invention, and the evaluation result IS shown in the following table:

Method	perception score (IS)
		Without disentanglement generator (convolution with up-sampling only)	1.7894±0.1136
Does not contain multi-stage discriminators	1.9347±0.1220
		The invention	2.2010±0.0884

It can be seen that the method adopted by the invention has the highest IS value, i.e. the method proposed by the invention has the best implementation effect. However, for the generated model, it IS not enough to evaluate only by using the IS value, in order to illustrate the evaluation effect of the method in generating the image perception effect, the image perception similarity index (LPIPS) of the test effect IS also tested, the lower the value IS, the better the perception effect IS, and the evaluation results are shown in the following table:

Method	image perception similarity index (LPIPS)
		Without disentanglement generator (convolution with up-sampling only)	0.1297
Does not contain multi-stage discriminators	0.1349
		The invention	0.1126

The method LPIPS value adopted by the invention is the lowest, which shows the perception degree closest to the real image, and the two quantitative evaluation indexes well show that the method adopted by the invention has the best implementation effect.

The scheme provided by the invention can be applied to various fields such as electronic commerce, application software, garment design industry and the like. It should be noted that, all the above optional technical solutions may be combined arbitrarily to form an optional embodiment of the present application, and are not described in detail herein.

Claims

1. A clothing image generation method based on an disentanglement neural network is characterized by comprising the following steps:

s101, acquiring a plurality of clothing images with category labels;

2. The clothing image generation method based on the disentanglement neural network as claimed in claim 1, wherein the class label and the color label of the clothing image are obtained by a one-hot coding method, and the classification of the clothing image color is performed by converting an RGB model of the clothing image into an HSV model by using an OpenCV tool.

3. The method for generating a garment image based on an disentanglement neural network according to claim 1, wherein the step S103 specifically includes:

s103_1, training a de-entanglement neural network, wherein the de-entanglement neural network is a conditional countermeasure network and comprises a de-entanglement generator G and a multi-stage discriminator D, the de-entanglement generator generates a picture after extracting the pattern features of the clothing image, and the multi-stage discriminator generates a real picture and a generated pictureThe input of the disentanglement neural network is the class label l of the clothing image_classColor label l_colorAnd a random noise variable z, the disentangle generator output G (z, (l)_class，l_color) Log (G (z, (l)) is output from the multi-stage discriminator_class，l_color) )) and log (I)_real) Corresponding to the discrimination results of the generated picture and the real picture by the multi-stage discriminator, I_realIs the concatenation of the real garment image and its label;

the overall objective function during training is:

i.e., the overall GAN loss function is:

among them, the above-mentioned materials are used,

4. The clothing image generation method based on the de-entanglement neural network as claimed in claim 3, wherein the multi-level discriminator is composed of a local discriminator and a global discriminator, the local discriminator and the global discriminator respectively perform down-sampling discrimination on the real picture and the generated picture on two different scales, and finally the results of the sampling are combined to obtain the discrimination results of the multi-level discriminator.

5. The clothing image generation method based on the disentanglement neural network, according to claim 3, wherein the generated image intermediate feature output of the multi-stage discriminator is matched with the real image intermediate feature output, and the feature matching loss function is:

6. The method for generating clothes image based on disentanglement neural network as claimed in claim 3, wherein the disentanglement generator is composed of mapping network ω and progressive generation network G_progressThe mapping network omega is composed of a fully-connected network layer, random noise and label codes are input into an intermediate latent space, and an intermediate latent code psi (psi) is output_style，ψ_bias) Controlling parameters of an adaptive instance normalization layer, the normalization function being:

wherein each feature map x_iRespectively carrying out normalization operation, scaling the feature map by the intermediate latent code psi_styleDoubling and adding offset psi_bias(ii) a Progressive generation network G_progressIs a convolution module with an adaptive instance normalization layerThe convolution module performs up-sampling by adopting a linear interpolation method, the amplification factor is 2, the input is intermediate latent codes and random Gaussian noise, the added noise follows Gaussian distribution, the mean value is 0, the variance is 1, and finally, the output is converted into an RGB image through a convolution layer with a convolution kernel of 1.

7. The method as claimed in claim 6, wherein the mapping network maps the input signal to the potential space variable ω, and then the mapping transformation transforms the potential space variable into the model variable y ═ y (y ═ y)_s,y_b) Output to progressive generation network G_progress，y_s、y_bScaling the multiple and the deviation respectively, after each convolution module, the style variable controls a parameter of the adaptive instance normalization, and the operation of the adaptive instance normalization parameter is represented as:

8. The method for generating a garment image based on an disentanglement neural network according to claim 1, wherein the step S104 specifically comprises: the real image, the category label and the color style label of the real image are sent to a multi-stage discriminator to judge whether the real image is true or false, the real image and the matched label are judged as true when the multi-stage discriminator starts training, and the clothing image generated by the disentanglement generator is judged as false.

9. The method for generating a garment image based on an disentanglement neural network according to claim 1, wherein the step S105 specifically comprises: in each training, parameters are optimized according to a judgment value and an objective function, specifically, features of generated images are extracted through a VGG network, and a perception loss function is added in the training process:

And the error is propagated backwards.

10. A clothing image generation system based on a disentanglement neural network is characterized by comprising: