CN113393550A - Fashion garment design synthesis method guided by postures and textures - Google Patents
Fashion garment design synthesis method guided by postures and textures Download PDFInfo
- Publication number
- CN113393550A CN113393550A CN202110660701.8A CN202110660701A CN113393550A CN 113393550 A CN113393550 A CN 113393550A CN 202110660701 A CN202110660701 A CN 202110660701A CN 113393550 A CN113393550 A CN 113393550A
- Authority
- CN
- China
- Prior art keywords
- texture
- semantic
- fashion
- loss
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/001—Texturing; Colouring; Generation of texture or colour
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Processing Or Creating Images (AREA)
Abstract
The invention discloses a method for designing and synthesizing fashion clothes guided by postures and textures. The method comprises the following steps: 1. collecting task data by means of an existing fashion data set, preprocessing the data, and constructing a fashion image, posture information and semantic information data set; 2. constructing a two-stage generation model by taking the natural and accurate fashion image as a target; the generation model comprises a semantic layout generation network and a texture generation network, so that effective transfer of textures is realized, and diversified fashion images are generated; 3. training a semantic layout generation network and a texture transfer network by utilizing the collected data set under the conditions of minimized countermeasure loss, cross entropy loss, pixel level loss, perception loss and style loss; 4. and training the network parameters in the generated model through a back propagation algorithm until the whole model converges to generate a corresponding fashion image. The invention has performed experiments on the fast-Gen data set, and has obtained good results both quantitatively and qualitatively.
Description
Technical Field
The invention provides a novel method for Fashion clothing Design Synthesis (Pose and Texture Guided Multi-View fast Design Synthesis) Guided by gestures and textures, mainly relating to a method for converting input human gestures into a series of human semantic layouts by utilizing a semantic layout generation network and a method for realizing Texture transfer by using the Texture transfer network and generating a real Fashion image by generating a confrontation network.
Background
Due to the high demand of real life application and the breakthrough development of relevant theories and technologies such as deep learning, machine learning, computer vision, multimedia technology and the like, the task of combining artificial intelligence and fashion has received considerable attention in recent years, such as garment identification, garment retrieval, fashion recommendation, fashion trend prediction and the like, and the subject of research is garments. In recent years, computer researchers have also developed a wide range of research applications in the field of fashion image synthesis, such as human-body-posture-guided garment image generation algorithms, text-guided garment image generation algorithms, virtual fitting algorithms based on image generation models, garment design applications based on image generation models, and the like, due to the remarkable results obtained by generative models (e.g., GANs, VAEs) in image synthesis.
The human body posture guided clothing image generation algorithm takes a human body posture picture as an input condition, changes the existing clothing picture containing the character model, and synthesizes a brand new clothing image. The method for generating the clothing image guided by the text comprises the steps of changing the existing clothing image containing the character model by taking the text description containing the clothing characteristic semantics as an input condition, and synthesizing a brand-new clothing image. Virtual fitting algorithms based on image-generating models are given a picture of a character model and a picture of a target garment, which first generate a rough fitting result graph, wherein the deformed target garment is transferred to the correct area of the character model. The garment design application based on the image generation model is used for controlling output garment design drawings through information such as color, texture and shape. The method can be classified into the clothing design application based on the image generation model, and various fashionable clothing design drawings are generated through posture and texture information control, so that the work of designers is reduced, and the design cycle of fashionable products is accelerated.
On the pose and texture guided fashion image generation task, the existing simple idea is to apply standard image-to-image conversion models directly, such as pix2pix and pix2pixHD, to solve our proposed problem. However, these methods essentially learn the mapping from the source image to the target image. Experimental results demonstrate that this does not fulfill our task. Furthermore, our task requires solving several challenging problems.
1) Too little information contained in the guidance gesture
The human body posture is usually represented by two-dimensional joint points, only the human body joint point information is contained, and the shape information is not contained, so that the human body structure and the clothing structure are difficult to be deduced from rough posture information by the conventional method.
2) Difficulty of texture transfer implementation
Due to the locality of a common convolutional network to feature processing, a special texture transmission mechanism does not exist in the existing fashion image generation method to realize the effective transfer of the texture of the fashion image. Secondly, since the regions of the garment are usually irregular, how to accurately transfer the texture to the corresponding regions of the garment by using texture blocks of any size is also a challenge in synthesizing natural and realistic fashion images. The existing fashion image generation method realizes generation of pure-color textures, cannot realize effective transfer of complex textures, and generally realizes generation of local textures or generation of incorrect textures.
3) Diversity limitation of fashion garment generation
The existing fashion image generation method generally uses posture information of a human body or semantic information of the human body for guidance, the type of a garment structure is fixed, and fashion images corresponding to various garment types and fashion styles in real situations cannot be generated.
Our approach addresses the existing problem of synthesizing diverse and accurate fashion images under the guidance of pose and texture information.
Disclosure of Invention
The invention provides a method for designing and synthesizing Fashion clothes guided by postures and textures.
A method for synthesizing fashion clothing design guided by posture and texture comprises the following steps:
and (1) collecting task data by means of the existing fashion data set, preprocessing the data, and constructing a fashion image, posture information and semantic information data set.
Step (2), constructing a two-stage generation model by taking the generation of natural and accurate fashion images as a target under the existing fashion data set; the generation model comprises a semantic layout generation network and a texture generation network, so that effective transfer of textures is realized, and diversified fashion images are generated.
And (3) training a semantic layout generation network and a texture transfer network by utilizing the collected data set under the conditions of minimized countermeasure loss, cross entropy loss, pixel level loss, perception loss and style loss.
And (4) training the network parameters in the generated model in the step (3) through a back propagation algorithm until the whole model converges to generate a corresponding fashion image.
The collection of mission data with the existing Fashion dataset described in step (1) means that we evaluated our method on the Fashion-Gen dataset because it contains various complex garment textures. We selected 4 major garment categories (i.e., dress, shirt, sweater, and coat) from the 48 major Fashion categories in the fast-Gen dataset for evaluation.
The step (1) of constructing the fashion image, the pose information and the semantic information data set means that the pose of the person is estimated from the fashion image by using the most advanced pose estimator for the corresponding fashion image data, and the calculated pose information of the person includes 18 joint coordinate points. In addition, an advanced body parser is used to compute a body semantic information containing 20 tags, each representing a specific part of the body, such as the face, hair, arms, legs and clothing areas.
Constructing a two-stage generation model in the step (2), wherein the two-stage generation model comprises a semantic layout generation network and a texture generation network, so that effective transfer of textures is realized, and diversified fashion images are generated, and the two-stage generation model specifically comprises the following steps:
the first stage is as follows: semantic layout generation network
In the semantic layout generation network, our goal is to map the guiding pose p to the semantic layout of a series of people { H }1,H2,....,HN}. These semantic layouts provide sufficient a priori knowledge of the shape of the human body and the structure of the garment.
The method is characterized in that the posture information and the corresponding semantic information are used as input, diversified semantic information is learned and generated, the simple UNet network can also generate corresponding semantic output, but the requirement of the diversity cannot be met, and the semantic layout generation network is built on the basis of a BicycleGAN model because the semantic layout generation network encourages a plurality of outputs generated from a single source image to complete the task of translating the image into the image. The semantic layout generation network comprises a conditional variational self-coding neural sub-network and a conditional latent recurrent neural sub-network.
The conditional variational self-coding neural subnetwork uses the attitude information and the semantic information as input together, uses an encoder to process the semantic information, encodes to obtain a potential vector of a control feature, and then inputs the potential vector and the attitude information into a generator together to generate corresponding reconstructed semantic information; and KL loss is used for constraining the potential vector to obey Gaussian distribution, so that sampling is facilitated during testing.
The conditional latent regression neural network uses attitude information and randomly sampled Gaussian distribution-obeying vectors as input of a generator, generates a real semantic layout under the constraint of a discriminator, processes the generated semantic layout by using an encoder, and uses the L1 loss constraint-generating vectors and the original Gaussian distribution-obeying vectors to ensure one-to-one generation and further realize the output of diversified semantic information.
And a second stage: texture generation network
In the texture generation network, the aim is to design a texture generation network to generate the texture on the semantic layout converted by the semantic layout generation network, wherein the synthesized texture requirement of the clothing region is consistent with the example of the guide texture, and the synthesized human appearance has the perception persuasion. The diversified semantic layout output of the semantic layout generation network provides a multimodal information input for our texture generation network.
The texture generation of the top and bottom garments is processed separately, the top and bottom garments are generated respectively, the texture block area mask and the clothing area mask are used as input, and a texture generation network is realized through an encoder, a texture generation block, a decoder and a Patch-GAN discriminator respectively. The encoder decodes the input texture block, the texture generation block transfers local texture features to the corresponding clothing region, and the decoder decodes the reconstructed features into the corresponding fashion image. In order to make the generated fashion image more realistic, a Patch-GAN discriminator is added to be trained together with an encoder, a texture generation block and a decoder.
The encoder of the texture generation network:
the Encoder adopts a common Encoder structure to decode the input texture block, and compared with other methods, partial convolution is used in the Encoder to replace a standard convolution layer, so that artifacts such as blurring and color difference are avoided. The partial convolution at each position is expressed as:
wherein X is the characteristic value of the current convolution (sliding) window, M is the binary mask of the texture block area mask corresponding to the current convolution window, W is the weight of the convolution filter, and b is the offset. sum (M) is the number of 1's in the binary mask.
After each partial convolution operation, updating the mask by marking the corresponding position of the mask after the window convolution operation as valid if at least one valid input value exists in the binary mask of the current convolution window, and expressing as follows:
the texture generation block of the texture generation network:
we have found that previous work has achieved the effect of texture generation using solely convolution to model the correlation between different image regions. However, because the convolution operation has a local acceptance domain, the long-distance dependency relationship must be processed through several convolution layers, the learning effect is not good, and the texture generation effect is difficult to realize. We introduce a texture generation block that reconstructs the texture features of the existing encoder output by using an attention map. And (3) forming a similarity matrix by calculating the cosine value similarity among the texture feature blocks, and activating by using a softmax function to obtain an attention map, so that feature information is copied from the existing texture feature blocks, and the texture of the missing part of the garment is generated. To better learn the correlation between textures, i use features one layer higher than the reconstructed features to compute cosine similarity between features. The similarity matrix is calculated as follows:
andrespectively extracted texture featuresThe ith texture feature block and the jth texture feature block in the block, andis composed ofAndis scored. We apply the softmax function to activate and obtain the initial attention map of the ith texture feature block
From texture features according to similarity calculation formulaInitial attention map AS for extracting whole texture featureslWe then use an attention-seeking scheme to reconstruct each block within the texture feature separately by a deconvolution operation:
wherein the content of the first and second substances,is the ith block extracted within the texture feature,is the jth block extracted within the texture feature. Reconstructing all blocks through attention scores to finally obtain reconstructed featuresWherein L is E [1, L-1 ∈]L is the characteristic number output by the encoder, and L is the corresponding characteristic layer serial number. After that time, the user can use the device,further refinement is achieved by four sets of dilation convolutions at different rates.
The decoder of the texture generation network:
the SPADE structure (space adaptive normalization method) and the Decoder structure are combined, so that the introduction of human body information is realized, the generated clothing shape is further constrained through semantic information, and the characteristics generated by the reconstructed texture after coding and the semantic information are combined and decoded into a corresponding fashion image. The calculation process of the spatial adaptive normalization is as follows:
wherein, for the input semantic layout HsExtracting features by convolution, and obtaining normalized scaling coefficient by two convolution layers respectivelyAnd bias termWherein x, y and c are the height, width and channel number of the feature respectively, and n is the number of samples participating in training.Andare respectively input featuresMean and standard deviation of. The calculation formula is as follows, and this part is the same as the calculation in BN.
H, W, C are the height, width, and number of channels, respectively, of the semantic layout input. x, y and c are respectively the height, width and channel number of the input feature, and n is the number of samples participating in training. N is the number of samples involved in training.
And (3) constructing a deep learning framework, and training a semantic layout generation network and a texture generation network by utilizing the collected data set under the conditions of minimized countermeasure loss, cross entropy loss, pixel level loss, perception loss and style loss. The method comprises the following specific steps:
because the details of fashion images are complex, how to train the generator well is a great challenge. To solve this problem, we use multiple penalties for training from different aspects, namely, antagonism penalties, cross-entropy penalties, pixel-level penalties, perceptual penalties, and gram matrix-based style penalties.
The overall penalty of the semantic layout generation network is defined as follows:
the first three terms respectively correspond to the conditional variation self-encoder to generate an objective function of the countermeasure network, and the second two terms respectively correspond to the conditional potential regression to generate the objective function of the countermeasure network. Lambda [ alpha ]vae=2,λseg=3,λkl=0.01,λgan=2,λlatent30 is a parameter of each loss function. Unlike the original BicycleGAN model, we used softmax activation at the last layer of the generator and used cross-entropy loss to predict human semantic layout. In semantic layout transformation, cross-entropy loss constrains pixel-level precision as defined below:
h, W, C are the height, width, and number of channels, respectively, of the semantic layout input. HsIs the semantic layout that is generated and,is corresponding toThe true semantic layout of (2).
The overall loss of the texture generation network is defined as follows:
wherein the content of the first and second substances,is to counter the loss of the liquid,is a generated fashion imageAnd a real imageL between1The loss of the carbon dioxide gas is reduced,is thatAndin the middle of the perception loss, andis thatAndstyle loss in between. Lambda [ alpha ]adv=0.1,λrec=6,λper=0.5,λsty50 is the parameter of each loss function.
The invention has the beneficial effects that:
the invention provides a method for designing and synthesizing fashion clothes guided by postures and textures aiming at the practical problems of poor and single generation effect of the existing fashion images, solves the problems of too little information contained in the guided postures, locality and inaccuracy of texture transfer and single generation of the fashion images in the existing method, and realizes the generation of the diversity and the accuracy of the fashion images to a great extent. In addition, the task of combining artificial intelligence and fashion is taken as a current research hotspot, the reasonable use also enables the invention to have more advanced and innovative scientific research, and corresponding real and various fashion images are automatically designed and generated according to input control conditions (posture information and texture information) of a plurality of modes, so that the design inspiration of clothing designers can be further stimulated, and the development and application popularization of creative design related research in the fashion field can be promoted.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention.
FIG. 2 is a semantic layout generating network model in the method of the present invention.
FIG. 3 is a model of a texture transfer network in the method of the present invention.
FIG. 4 is a schematic of the data set of the present invention.
Detailed Description
The invention is further illustrated by the following figures and examples.
The invention provides a method for designing and synthesizing Fashion clothes guided by postures and textures.
As shown in fig. 1, a method for synthesizing fashion garment design guided by pose and texture comprises the following steps:
and (1) collecting task data by means of the existing fashion data set, preprocessing the data, and constructing a fashion image, posture information and semantic information data set.
Step (2), constructing a two-stage generation model by taking the generation of natural and accurate fashion images as a target under the existing fashion data set; the generation model comprises a semantic layout generation network and a texture generation network, so that effective transfer of textures is realized, and diversified fashion images are generated.
And (3) training a semantic layout generation network and a texture transfer network by utilizing the collected data set under the conditions of minimized countermeasure loss, cross entropy loss, pixel level loss, perception loss and style loss.
And (4) training the network parameters in the generated model in the step (3) through a back propagation algorithm until the whole model converges to generate a corresponding fashion image.
The collection of mission data with the existing Fashion dataset described in step (1) means that we evaluated our method on the Fashion-Gen dataset because it contains various complex garment textures. We selected 4 major garment categories (i.e., dress, shirt, sweater, and coat) from the 48 major Fashion categories in the fast-Gen dataset for evaluation.
The step (1) of constructing the fashion image, the pose information and the semantic information data set means that the pose of the person is estimated from the fashion image by using the most advanced pose estimator for the corresponding fashion image data, and the calculated pose information of the person includes 18 joint coordinate points. In addition, an advanced body parser is used to compute a body semantic information containing 20 tags, each representing a specific part of the body, such as the face, hair, arms, legs and clothing areas.
Constructing a two-stage generation model in the step (2), wherein the two-stage generation model comprises a semantic layout generation network and a texture generation network, so that effective transfer of textures is realized, and diversified fashion images are generated, and the two-stage generation model specifically comprises the following steps:
as shown in fig. 2, the first stage: semantic layout generation network
In the semantic layout generation network, our goal is to map the guiding pose p to the semantic layout of a series of people { H }1,H2,....,HN}. These semantic layouts provide sufficient human satisfactionA priori knowledge of the shape of the body and the structure of the garment.
The method is characterized in that the posture information and the corresponding semantic information are used as input, diversified semantic information is learned and generated, the simple UNet network can also generate corresponding semantic output, but the requirement of the diversity cannot be met, and the semantic layout generation network is built on the basis of a BicycleGAN model because the semantic layout generation network encourages a plurality of outputs generated from a single source image to complete the task of translating the image into the image. The semantic layout generation network comprises a conditional variational self-coding neural sub-network and a conditional latent recurrent neural sub-network.
The conditional variational self-coding neural subnetwork uses the attitude information and the semantic information as input together, uses an encoder to process the semantic information, encodes to obtain a potential vector of a control feature, and then inputs the potential vector and the attitude information into a generator together to generate corresponding reconstructed semantic information; and KL loss is used for constraining the potential vector to obey Gaussian distribution, so that sampling is facilitated during testing.
The conditional latent regression neural network uses attitude information and randomly sampled Gaussian distribution-obeying vectors as input of a generator, generates a real semantic layout under the constraint of a discriminator, processes the generated semantic layout by using an encoder, and uses the L1 loss constraint-generating vectors and the original Gaussian distribution-obeying vectors to ensure one-to-one generation and further realize the output of diversified semantic information.
As shown in fig. 3, the second stage: texture generation network
In the texture generation network, the aim is to design a texture generation network to generate the texture on the semantic layout converted by the semantic layout generation network, wherein the synthesized texture requirement of the clothing region is consistent with the example of the guide texture, and the synthesized human appearance has the perception persuasion. The diversified semantic layout output of the semantic layout generation network provides a multimodal information input for our texture generation network.
The texture generation of the top and bottom garments is processed separately, the top and bottom garments are generated respectively, the texture block area mask and the clothing area mask are used as input, and a texture generation network is realized through an encoder, a texture generation block, a decoder and a Patch-GAN discriminator respectively. The encoder decodes the input texture block, the texture generation block transfers local texture features to the corresponding clothing region, and the decoder decodes the reconstructed features into the corresponding fashion image. In order to make the generated fashion image more realistic, a Patch-GAN discriminator is added to be trained together with an encoder, a texture generation block and a decoder.
The encoder of the texture generation network:
the Encoder adopts a common Encoder structure to decode the input texture block, and compared with other methods, partial convolution is used in the Encoder to replace a standard convolution layer, so that artifacts such as blurring and color difference are avoided. The partial convolution at each position is expressed as:
wherein X is the characteristic value of the current convolution (sliding) window, M is the binary mask of the texture block area mask corresponding to the current convolution window, W is the weight of the convolution filter, and b is the offset. sum (M) is the number of 1's in the binary mask.
After each partial convolution operation, updating the mask by marking the corresponding position of the mask after the window convolution operation as valid if at least one valid input value exists in the binary mask of the current convolution window, and expressing as follows:
the texture generation block of the texture generation network:
we have found that previous work has achieved the effect of texture generation using solely convolution to model the correlation between different image regions. However, because the convolution operation has a local acceptance domain, the long-distance dependency relationship must be processed through several convolution layers, the learning effect is not good, and the texture generation effect is difficult to realize. We introduce a texture generation block that reconstructs the texture features of the existing encoder output by using an attention map. And (3) forming a similarity matrix by calculating the cosine value similarity among the texture feature blocks, and activating by using a softmax function to obtain an attention map, so that feature information is copied from the existing texture feature blocks, and the texture of the missing part of the garment is generated. To better learn the correlation between textures, i use features one layer higher than the reconstructed features to compute cosine similarity between features. The similarity matrix is calculated as follows:
andrespectively extracted texture featuresThe ith texture feature block and the jth texture feature block in the block, andis composed ofAndis scored. We apply the softmax function to activate and obtain the initial attention map of the ith texture feature block
From texture features according to similarity calculation formulaInitial attention map AS for extracting whole texture featureslWe then use an attention-seeking scheme to reconstruct each block within the texture feature separately by a deconvolution operation:
wherein the content of the first and second substances,is the ith block extracted within the texture feature,is the jth block extracted within the texture feature. Reconstructing all blocks through attention scores to finally obtain reconstructed featuresWherein L is E [1, L-1 ∈]L is the characteristic number output by the encoder, and L is the corresponding characteristic layer serial number. After that time, the user can use the device,further refinement is achieved by four sets of dilation convolutions at different rates.
The decoder of the texture generation network:
the SPADE structure (space adaptive normalization method) and the Decoder structure are combined, so that the introduction of human body information is realized, the generated clothing shape is further constrained through semantic information, and the characteristics generated by the reconstructed texture after coding and the semantic information are combined and decoded into a corresponding fashion image. The calculation process of the spatial adaptive normalization is as follows:
wherein, for the input semantic layout HsBy convolutional extractionTaking characteristics, and obtaining normalized scaling coefficient through two convolution layers respectivelyAnd bias termWherein x, y and c are the height, width and channel number of the feature respectively, and n is the number of samples participating in training.Andare respectively input featuresMean and standard deviation of. The calculation formula is as follows, and this part is the same as the calculation in BN.
H, W, C are the height, width, and number of channels, respectively, of the semantic layout input. x, y and c are respectively the height, width and channel number of the input feature, and n is the number of samples participating in training. N is the number of samples involved in training.
And (3) constructing a deep learning framework, and training a semantic layout generation network and a texture generation network by utilizing the collected data set under the conditions of minimized countermeasure loss, cross entropy loss, pixel level loss, perception loss and style loss as shown in FIG. 3. The method comprises the following specific steps:
because the details of fashion images are complex, how to train the generator well is a great challenge. To solve this problem, we use multiple penalties for training from different aspects, namely, antagonism penalties, cross-entropy penalties, pixel-level penalties, perceptual penalties, and gram matrix-based style penalties.
The overall penalty of the semantic layout generation network is defined as follows:
the first three terms respectively correspond to the conditional variation self-encoder to generate an objective function of the countermeasure network, and the second two terms respectively correspond to the conditional potential regression to generate the objective function of the countermeasure network. Lambda [ alpha ]vae=2,λseg=3,λkl=0.01,λgan=2,λlatent30 is a parameter of each loss function. Unlike the original BicycleGAN model, we used softmax activation at the last layer of the generator and used cross-entropy loss to predict human semantic layout. In semantic layout transformation, cross-entropy loss constrains pixel-level precision as defined below:
h, W, C are the height, width, and number of channels, respectively, of the semantic layout input. HsIs the semantic layout that is generated and,is the corresponding true semantic layout.
The overall loss of the texture generation network is defined as follows:
wherein the content of the first and second substances,is to counter the loss of the liquid,is a generated fashion imageAnd a real imageL between1The loss of the carbon dioxide gas is reduced,is thatAndin the middle of the perception loss, andis thatAndstyle loss in between. Lambda [ alpha ]adv=0.1,λrec=6,λper=0.5,λsty50 is the parameter of each loss function.
Claims (7)
1. A method for synthesizing fashion garment design guided by posture and texture is characterized by comprising the following steps:
the method comprises the following steps that (1) task data are collected by means of an existing fashion data set, the data are preprocessed, and a fashion image, posture information and semantic information data set is constructed;
step (2), constructing a two-stage generation model by taking the generation of natural and accurate fashion images as a target under the existing fashion data set; the generation model comprises a semantic layout generation network and a texture generation network, so that effective transfer of textures is realized, and diversified fashion images are generated;
step (3), training a semantic layout to generate a network and a texture transfer network by utilizing the collected data set under the conditions of minimized countermeasure loss, cross entropy loss, pixel level loss, perception loss and style loss;
and (4) training the network parameters in the generated model in the step (3) through a back propagation algorithm until the whole model converges to generate a corresponding fashion image.
2. The method of claim 1, wherein the step (1) of constructing the fashion image, the pose information, and the semantic information data set is to estimate the pose of the person from the fashion image using a state-of-the-art pose estimator for the corresponding fashion image data, and the calculated pose information of the person comprises 18 joint coordinate points; in addition, an advanced body parser is used to compute a body semantic information containing 20 tags, each representing a specific part of the body, such as the face, hair, arms, legs and clothing areas.
3. The method of claim 2, wherein the step (2) of constructing a two-stage generative model comprising a semantic layout generation network and a texture generation network to achieve effective texture transfer and generate diverse fashion images comprises the steps of: the semantic layout generation network is specifically implemented as follows:
in a semantic layout generation network, the goal is to map the guide poses p to the semantic layouts { H } of a series of people1,H2,....,HN}; these semantic layouts provide sufficient a priori knowledge of the shape of the human body and the structure of the garment;
using the posture information and the corresponding semantic information as input, and learning to generate diversified semantic information; the semantic layout generating network is established on the basis of a BicycleGAN model and comprises a conditional variational self-coding neural sub-network and a conditional latent recurrent neural sub-network;
the conditional variational self-coding neural subnetwork uses the attitude information and the semantic information as input together, uses an encoder to process the semantic information, encodes to obtain a potential vector of a control feature, and then inputs the potential vector and the attitude information into a generator together to generate corresponding reconstructed semantic information; KL loss is used for restraining potential vectors to obey Gaussian distribution, so that sampling is facilitated during testing;
the conditional latent regression neural network uses attitude information and randomly sampled Gaussian distribution-obeying vectors as input of a generator, generates a real semantic layout under the constraint of a discriminator, processes the generated semantic layout by using an encoder, and uses the L1 loss constraint-generating vectors and the original Gaussian distribution-obeying vectors to ensure one-to-one generation and further realize the output of diversified semantic information.
4. A method of pose and texture guided fashion garment design synthesis according to claim 3, characterized by a second stage: the texture generation network is specifically implemented as follows:
in the texture generation network, the aim is to design a texture generation network to generate the texture on the semantic layout converted by the semantic layout generation network, wherein the synthesized texture requirement of the clothing region is consistent with the example of the guide texture, and the synthesized human appearance has perception persuasion; the diversified semantic layout output of the semantic layout generation network provides multi-modal information input for the texture generation network;
the texture generation of the top and bottom garments is processed separately, the top and bottom garments are generated respectively, the texture block area mask and the clothing area mask are used as input, and a texture generation network is realized through an encoder, a texture generation block, a decoder and a Patch-GAN discriminator respectively; the encoder decodes the input texture block, the texture generation block transfers local texture features to a corresponding clothing region, and the decoder decodes the reconstructed features into a corresponding fashion image; meanwhile, a Patch-GAN discriminator is added to be trained together with an encoder, a texture generation block and a decoder;
the encoder of the texture generation network:
the Encoder adopts a common Encoder structure to decode the input texture block, and compared with other methods, partial convolution is used in the Encoder to replace a standard convolution layer, so that the generation of blurring and color difference is avoided; the partial convolution at each position is expressed as:
wherein X is the characteristic value of the current convolution (sliding) window, M is the binary mask of the texture block area mask corresponding to the current convolution window, W is the weight of the convolution filter, and n is the offset; sum (M) is the number of 1's in the binary mask;
after each partial convolution operation, updating the mask by marking the corresponding position of the mask after the window convolution operation as valid if at least one valid input value exists in the binary mask of the current convolution window, and expressing as follows:
5. a method of pose and texture guided fashion garment design synthesis as claimed in claim 3, characterized by entering texture generation block, reconstructing texture features of existing encoder output by using attention map; the similarity matrix is formed by calculating the cosine value similarity among the texture feature blocks, and the attention map is obtained by using the softmax function activation, so that the feature information is copied from the existing texture feature blocks, and the texture of the missing part of the garment is generated; in order to better learn the correlation degree between textures, cosine similarity between features is calculated by using features of a layer higher than reconstructed features; the similarity matrix is calculated as follows:
andrespectively extracted texture featuresThe ith texture feature block and the jth texture feature block in the block, andis composed ofAnd(ii) similarity score of (d); applying softmax function to activate and obtain initial attention diagram of ith texture feature block
From texture features according to similarity calculation formulaInitial attention map AS for extracting whole texture featureslThen, each block within the texture feature is separately reconstructed by a deconvolution operation using an attention map:
wherein the content of the first and second substances,is the ith block extracted within the texture feature,is the jth block extracted within the texture feature; reconstructing all blocks through attention scores to finally obtain reconstructed featuresWherein L is E [1, L-1 ∈]L is the characteristic number output by the encoder, and L is the serial number of the corresponding characteristic layer; after that time, the user can use the device,further refinement is achieved by four sets of dilation convolutions at different rates.
6. A pose and texture guided fashion garment design synthesis method according to claim 4 or 5, characterized by the texture generation network decoder:
the SPADE structure is combined with the Decoder structure, so that the introduction of human body information is realized, the generated clothing shape is further constrained through semantic information, and the characteristics generated by the reconstructed texture after coding and the semantic information are combined and decoded into a corresponding fashion image; the calculation process of the spatial adaptive normalization is as follows:
wherein, for the input semantic layout HsExtracting features by convolution, and obtaining normalized scaling coefficient by two convolution layers respectivelyAnd bias termWherein x, y, c are the characteristic heights,Width and channel number, n is the number of samples participating in training;andare respectively input featuresMean and standard deviation of; the calculation formula is as follows, and this part is the same as the calculation in BN;
h, W, C, height, width and channel number of semantic layout input; x, y and c are respectively the height, width and channel number of the input features, and n is the number of samples participating in training; n is the number of samples involved in training.
7. The method of claim 6, wherein step (3) is implemented as follows:
training with multiple losses from different aspects, namely, antagonism losses, cross-entropy losses, pixel-level losses, perceptual losses, and gram matrix-based style losses;
the overall penalty of the semantic layout generation network is defined as follows:
wherein the first three terms respectively correspond to conditional variational self-encoderThe target function of the antagonistic network is generated by the potential regression of the latter two terms respectively corresponding to the conditions; lambda [ alpha ]vae=2,λseg=3,λkl=0.01,λgan=2,λlatent30 is the parameter of each loss function; the last layer of the generator is activated by softmax, and the semantic layout of the human is predicted by adopting cross entropy loss; in semantic layout transformation, cross-entropy loss constrains pixel-level precision as defined below:
h, W, C, height, width and channel number of semantic layout input; hsIs the semantic layout that is generated and,is the corresponding true semantic layout;
the overall loss of the texture generation network is defined as follows:
wherein the content of the first and second substances,is to counter the loss of the liquid,is a generated fashion imageAnd a real imageL between1The loss of the carbon dioxide gas is reduced,is thatAndin the middle of the perception loss, andis thatAndstyle loss in between; lambda [ alpha ]adv=0.1,λrec=6,λper=0.5,λsty50 is the parameter of each loss function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110660701.8A CN113393550B (en) | 2021-06-15 | 2021-06-15 | Fashion garment design synthesis method guided by postures and textures |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110660701.8A CN113393550B (en) | 2021-06-15 | 2021-06-15 | Fashion garment design synthesis method guided by postures and textures |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113393550A true CN113393550A (en) | 2021-09-14 |
CN113393550B CN113393550B (en) | 2022-09-20 |
Family
ID=77621042
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110660701.8A Active CN113393550B (en) | 2021-06-15 | 2021-06-15 | Fashion garment design synthesis method guided by postures and textures |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113393550B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113838166A (en) * | 2021-09-22 | 2021-12-24 | 网易(杭州)网络有限公司 | Image feature migration method and device, storage medium and terminal equipment |
CN114723843A (en) * | 2022-06-01 | 2022-07-08 | 广东时谛智能科技有限公司 | Method, device, equipment and storage medium for generating virtual clothing through multi-mode fusion |
CN115147526A (en) * | 2022-06-30 | 2022-10-04 | 北京百度网讯科技有限公司 | Method and device for training clothing generation model and method and device for generating clothing image |
CN115659852A (en) * | 2022-12-26 | 2023-01-31 | 浙江大学 | Layout generation method and device based on discrete potential representation |
CN116229229A (en) * | 2023-05-11 | 2023-06-06 | 青岛科技大学 | Multi-domain image fusion method and system based on deep learning |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109325952A (en) * | 2018-09-17 | 2019-02-12 | 上海宝尊电子商务有限公司 | Fashion clothing image partition method based on deep learning |
CN109559287A (en) * | 2018-11-20 | 2019-04-02 | 北京工业大学 | A kind of semantic image restorative procedure generating confrontation network based on DenseNet |
US20200151807A1 (en) * | 2018-11-14 | 2020-05-14 | Beijing Jingdong Shangke Information Technology Co., Ltd. | System and method for automatically generating three-dimensional virtual garment model using product description |
CN111445426A (en) * | 2020-05-09 | 2020-07-24 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Target garment image processing method based on generation countermeasure network model |
CN111476241A (en) * | 2020-03-04 | 2020-07-31 | 上海交通大学 | Character clothing conversion method and system |
US20210065418A1 (en) * | 2019-08-27 | 2021-03-04 | Shenzhen Malong Technologies Co., Ltd. | Appearance-flow-based image generation |
-
2021
- 2021-06-15 CN CN202110660701.8A patent/CN113393550B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109325952A (en) * | 2018-09-17 | 2019-02-12 | 上海宝尊电子商务有限公司 | Fashion clothing image partition method based on deep learning |
US20200151807A1 (en) * | 2018-11-14 | 2020-05-14 | Beijing Jingdong Shangke Information Technology Co., Ltd. | System and method for automatically generating three-dimensional virtual garment model using product description |
CN109559287A (en) * | 2018-11-20 | 2019-04-02 | 北京工业大学 | A kind of semantic image restorative procedure generating confrontation network based on DenseNet |
US20210065418A1 (en) * | 2019-08-27 | 2021-03-04 | Shenzhen Malong Technologies Co., Ltd. | Appearance-flow-based image generation |
CN111476241A (en) * | 2020-03-04 | 2020-07-31 | 上海交通大学 | Character clothing conversion method and system |
CN111445426A (en) * | 2020-05-09 | 2020-07-24 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Target garment image processing method based on generation countermeasure network model |
Non-Patent Citations (6)
Title |
---|
XIAOLING GU;FEI GAO; MIN TAN; PAI PENG: "fashion analysis and understanding with artificial intelligence", 《INFORMATION PROCESSING & MANAGEMENT》 * |
XIAOLING GU;JUN YU;YONGKANG WONG;MOHAN S. KANKANHALLI: "Toward Multi-Modal Conditioned Fashion Image Translation", 《IEEE TRANSACTIONS ON MULTIMEDIA》 * |
徐俊哲; 陈佳; 何儒汉; 胡新荣: "基于姿态的时装图像合成研究", 《现代计算机》 * |
李锵等: "基于级联卷积神经网络的服饰关键点定位算法", 《天津大学学报(自然科学与工程技术版)》 * |
黄菲等: "基于生成对抗网络的异质人脸图像合成:进展与挑战", 《南京信息工程大学学报(自然科学版)》 * |
黄韬等: "基于生成对抗网络的文本引导人物图像编辑方法", 《广东技术师范大学学报》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113838166A (en) * | 2021-09-22 | 2021-12-24 | 网易(杭州)网络有限公司 | Image feature migration method and device, storage medium and terminal equipment |
CN113838166B (en) * | 2021-09-22 | 2023-08-29 | 网易(杭州)网络有限公司 | Image feature migration method and device, storage medium and terminal equipment |
CN114723843A (en) * | 2022-06-01 | 2022-07-08 | 广东时谛智能科技有限公司 | Method, device, equipment and storage medium for generating virtual clothing through multi-mode fusion |
CN114723843B (en) * | 2022-06-01 | 2022-12-06 | 广东时谛智能科技有限公司 | Method, device, equipment and storage medium for generating virtual clothing through multi-mode fusion |
CN115147526A (en) * | 2022-06-30 | 2022-10-04 | 北京百度网讯科技有限公司 | Method and device for training clothing generation model and method and device for generating clothing image |
CN115147526B (en) * | 2022-06-30 | 2023-09-26 | 北京百度网讯科技有限公司 | Training of clothing generation model and method and device for generating clothing image |
CN115659852A (en) * | 2022-12-26 | 2023-01-31 | 浙江大学 | Layout generation method and device based on discrete potential representation |
CN116229229A (en) * | 2023-05-11 | 2023-06-06 | 青岛科技大学 | Multi-domain image fusion method and system based on deep learning |
Also Published As
Publication number | Publication date |
---|---|
CN113393550B (en) | 2022-09-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113393550B (en) | Fashion garment design synthesis method guided by postures and textures | |
CN110211196B (en) | Virtual fitting method and device based on posture guidance | |
Zhang et al. | Pise: Person image synthesis and editing with decoupled gan | |
CN111275518A (en) | Video virtual fitting method and device based on mixed optical flow | |
CN108288072A (en) | A kind of facial expression synthetic method based on generation confrontation network | |
Kolotouros et al. | Dreamhuman: Animatable 3d avatars from text | |
Tang et al. | Multi-channel attention selection gans for guided image-to-image translation | |
US11282256B2 (en) | Crowdshaping realistic 3D avatars with words | |
CN113496507A (en) | Human body three-dimensional model reconstruction method | |
Li et al. | Learning symmetry consistent deep cnns for face completion | |
Sheng et al. | Deep neural representation guided face sketch synthesis | |
CN113255457A (en) | Animation character facial expression generation method and system based on facial expression recognition | |
CN111476241B (en) | Character clothing conversion method and system | |
CN113538608B (en) | Controllable figure image generation method based on generation countermeasure network | |
WO2023088277A1 (en) | Virtual dressing method and apparatus, and device, storage medium and program product | |
CN111462274A (en) | Human body image synthesis method and system based on SMP L model | |
Zeng et al. | Avatarbooth: High-quality and customizable 3d human avatar generation | |
Du et al. | VTON-SCFA: A virtual try-on network based on the semantic constraints and flow alignment | |
Kwolek et al. | Recognition of JSL fingerspelling using deep convolutional neural networks | |
CN113076918A (en) | Video-based facial expression cloning method | |
Liu et al. | Multimodal face aging framework via learning disentangled representation | |
CN116777738A (en) | Authenticity virtual fitting method based on clothing region alignment and style retention modulation | |
CN116168186A (en) | Virtual fitting chart generation method with controllable garment length | |
Kuo et al. | Generating ambiguous figure-ground images | |
Kim et al. | Development of an IGA-based fashion design aid system with domain specific knowledge |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |