CN116452291A

CN116452291A - Virtual fitting method, virtual fitting device, electronic equipment and storage medium

Info

Publication number: CN116452291A
Application number: CN202310383484.1A
Authority: CN
Inventors: 张少林; 张超速; 石园
Original assignee: Weifu Vision Beijing Technology Co ltd; Shenzhen Wave Kingdom Co ltd
Current assignee: Weifu Vision Beijing Technology Co ltd; Shenzhen Wave Kingdom Co ltd
Priority date: 2023-04-06
Filing date: 2023-04-06
Publication date: 2023-07-18

Abstract

The embodiment of the invention provides a virtual fitting method, a virtual fitting device, electronic equipment and a storage medium, and relates to the technical field of image processing. Comprising the following steps: acquiring a character image to be changed, a clothing image and a human semantic rough segmentation image based on the real-time image; inputting the character image to be changed, the clothing image and the human semantic rough segmentation image into a semantic generation model to obtain a human semantic optimization image; inputting the character image to be changed and the clothing image into a clothing appearance flow generating network to obtain a rough deformed clothing image; obtaining a clothing semantic part according to the human semantic optimization image, and correcting the rough deformed clothing image according to the clothing semantic part to obtain a fine deformed clothing image; inputting the fine deformation clothing image, the character image to be changed and the human body semantic optimization image into the semantic generation model to obtain a virtual fitting result. The practicality and the reliability of virtual fitting are greatly improved.

Description

Virtual fitting method, virtual fitting device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a virtual fitting method, a virtual fitting device, an electronic device, and a storage medium.

Background

With the development of computer technology and internet technology, online shopping has become an increasingly popular shopping option. However, online shopping has problems, for example, if the clothing is purchased online, the clothing cannot be first dressed and then purchased like an online physically store, which results in a clothing pattern that is not fully suitable for the consumer or does not fully meet the consumer's needs.

Based on this, a study of virtual fitting using a computer has been made. The mainstream virtual fitting technique is divided into two types: virtual fitting technology based on three-dimensional reconstruction and virtual fitting technology based on deep learning. However, in the prior art, the problem that the clothes are unnatural and uneven in warping generally exists, so that the clothes cannot be well attached to a human body, and a good fitting result cannot be obtained.

Disclosure of Invention

In order to solve the technical problems, embodiments of the present application provide a virtual fitting method, a virtual fitting device, an electronic device, and a storage medium.

In a first aspect, an embodiment of the present application provides a virtual fitting method, where the method includes:

acquiring a character image to be changed, a clothing image and a human semantic rough segmentation image based on the real-time image;

inputting the character image to be changed, the clothing image and the human semantic rough segmentation image into a semantic generation model to obtain a human semantic optimization image;

inputting the character image to be changed and the clothing image into a clothing appearance flow generating network to obtain a rough deformed clothing image;

obtaining a clothing semantic part according to the human semantic optimization image, and correcting the rough deformed clothing image according to the clothing semantic part to obtain a fine deformed clothing image;

inputting the fine deformation clothing image, the character image to be changed and the human body semantic optimization image into the semantic generation model to obtain a virtual fitting result.

In an embodiment, the acquiring the image of the character to be changed based on the real-time image includes:

acquiring the real-time image, and cutting the real-time image into a preset size to obtain a cut image;

and removing the background of the clipping image to obtain the character image to be changed.

In an embodiment, the semantic generation model includes a feature extraction module, a feature enhancement module, and a semantic generation module, and the inputting the image of the character to be changed, the clothing image, and the human semantic rough segmentation image into the semantic generation model, to obtain a human semantic optimized image, includes:

the character image to be changed, the clothing image and the human semantic rough segmentation image are processed through the feature extraction module to obtain a plurality of initial feature images;

fusing the plurality of initial feature images through the feature enhancement module to obtain an enhanced feature image;

and the enhanced feature map is passed through the semantic generation module to obtain the human body semantic optimization image.

In an embodiment, the feature enhancement module includes a channel attention block, a spatial attention block, and a jump connection structure;

the channel attention block is according to the formula

Extracting a channel attention feature, wherein,representing channel attention features, σ representing Sigmoid functions, avgPool representing average pooling, maxPool representing maximum pooling, MLP representing a multi-layer perceptron, and F representing the input of the feature enhancement module;

the spatial attention block is according to the formula

Extracting a spatial attention feature, wherein f ^3×3 Representing a convolution operation with a convolution kernel size of 3 x 3,and->The feature maps obtained by the average pooling and the maximum pooling of the channel attention processed F in the channel direction are shown, respectively.

In an embodiment, the garment appearance flow generating network includes a first encoding module, a second encoding module, a self-attention module, an up-sampling module and a deformation module, and the inputting the image of the person to be changed and the image of the garment into the garment appearance flow generating network, to obtain a rough deformed garment image includes:

the character image to be changed is subjected to character feature obtaining through the first coding module, and the clothing image is subjected to clothing feature obtaining through the second coding module;

fusing the character features and the clothing features into global style vectors;

the global style vector passes through the self-attention module to obtain an enhancement vector;

the enhancement vector passes through the up-sampling module to obtain a garment appearance stream;

and enabling the clothing appearance to flow through the deformation module to obtain the rough deformation clothing image.

In an embodiment, the human body semantic optimization image includes a face semantic part, a hairstyle semantic part, a neck semantic part, an arm semantic part, and a clothing semantic part, and the obtaining the clothing semantic part according to the human body semantic optimization image includes:

and separating the clothing semantic part from the human semantic optimization image.

In one embodiment, the correcting the rough deformed clothing image according to the clothing semantic part to obtain a fine deformed clothing image includes:

and carrying out multiplication operation on the semantic parts of the clothing and elements at the same position of the rough deformation clothing image one by one to obtain the fine deformation clothing image.

In a second aspect, embodiments of the present application provide a virtual fitting device, the device including:

the acquisition module is used for acquiring the character image to be changed, the clothing image and the human semantic rough segmentation image based on the real-time image;

the first generation module is used for inputting the character image to be changed, the clothing image and the human semantic rough segmentation image into a semantic generation model to obtain a human semantic optimization image;

the second generation module is used for inputting the character image to be changed and the clothing image into a clothing appearance flow generation network to obtain a rough deformed clothing image;

the correction module is used for obtaining a clothing semantic part according to the human semantic optimization image, correcting the rough deformed clothing image according to the clothing semantic part and obtaining a fine deformed clothing image;

and the third generation module is used for inputting the fine deformation clothing image, the character image to be changed and the human body semantic optimization image into the semantic generation model to obtain a virtual fitting result.

In a third aspect, an embodiment of the present application provides an electronic device, including a memory and a processor, where the memory is configured to store a computer program, and the computer program executes the virtual fitting method provided in the first aspect when the processor runs.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program which, when run on a processor, performs the virtual fitting method provided in the first aspect.

The virtual fitting method provided by the application can enable the virtual fitting to be more fit with the human body, can enable the human body and the clothes to be perfectly fused, and can keep the appearance characteristics of the clothes unchanged. After the human body wears the clothes, invariance of semantic information of the human body can be guaranteed, namely, the areas without shielding by clothes such as hands, heads and the like are kept unchanged, and the areas are consistent with the areas before the human body wears the clothes. The practicality and the reliability of virtual fitting are greatly improved.

Drawings

In order to more clearly illustrate the technical solutions of the present application, the drawings that are required for the embodiments will be briefly described, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope of protection of the present application. Like elements are numbered alike in the various figures.

Fig. 1 is a schematic flow chart of a virtual fitting method according to an embodiment of the present application;

FIG. 2 shows a schematic flow chart of obtaining a human semantic optimized image according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a feature enhancement module according to an embodiment of the disclosure;

FIG. 4 is a schematic flow chart of a rough deformed garment image according to an embodiment of the present application;

FIG. 5 shows a schematic flow chart of a method for obtaining a fine deformed garment image according to an embodiment of the present application;

fig. 6 is a schematic flow chart of obtaining virtual fitting results according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a virtual fitting device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments.

The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, are intended to be within the scope of the present application.

In the following, the terms "comprises", "comprising", "having" and their cognate terms may be used in various embodiments of the present application are intended only to refer to a particular feature, number, step, operation, element, component, or combination of the foregoing, and should not be interpreted as first excluding the existence of or increasing the likelihood of one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.

Furthermore, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of this application belong. The terms (such as those defined in commonly used dictionaries) will be interpreted as having a meaning that is identical to the meaning of the context in the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in connection with the various embodiments.

Example 1

The embodiment of the disclosure provides a virtual fitting method.

Specifically, referring to fig. 1, the virtual fitting method includes:

step S110, acquiring a character image to be changed, a clothing image and a human body semantic rough segmentation image based on the real-time image;

in an embodiment, the acquiring the image of the character to be changed based on the real-time image includes: acquiring the real-time image, and cutting the real-time image into a preset size to obtain a cut image; and removing the background of the clipping image to obtain the character image to be changed.

Specifically, the human body image may be acquired from a real-time image acquired in real time by the image pickup apparatus. The image of the person to be changed may be acquired in real time using Intel RealSense, and then we need to crop the acquired image to a standard size of 256 x 192, which is to be consistent with the image size of the VTON virtual fit dataset. To prevent the interference of the background on the changing effect, we need to scratch out the person from the background, so we use the person scratch algorithm PaddleSeg of hundred degrees flying paddle. Since the quality of the person image obtained from the actual scene is dark, we perform color contrast enhancement and image sharpening on the person image from the background.

The human body semantic rough segmentation image is obtained by designing and training a U-Net model based on a residual connection enhancement block based on the existing human body segmentation data set and applying the model to a virtual fitting task. Specifically, the human body image is divided into a plurality of parts with fine granularity semantics, such as body parts, clothes and the like, so as to provide more character information for the model and realize more comfortable and ideal clothes changing effect.

Step S120, inputting the character image to be changed, the clothing image and the human body semantic rough segmentation image into a semantic generation model to obtain a human body semantic optimization image;

the character image to be changed, the clothing image and the human semantic rough segmentation image are processed through the feature extraction module to obtain a plurality of initial feature images; fusing the plurality of initial feature images through the feature enhancement module to obtain an enhanced feature image; and the enhanced feature map is passed through the semantic generation module to obtain the human body semantic optimization image.

The semantic generation model input comprises a character image p to be changed, a clothing image g and a human body semantic rough segmentation image p _M . Because the clothing forms are various, some clothing is provided with a collar, and some clothing is long-sleeved or short-sleeved, so that the semantic information of the neck and the arm can be changed after makeup, and the semantic information of the neck and the arm in the semantic segmentation map of the character to be changed is set to be the same semantic as the clothing. Generating human body semantic optimization image p of character after changing clothes through improved UNet model _T 。

Specifically, please refer to fig. 2, p _R Is a human semantic segmentation map after p is not changed, p _T Is a human semantic segmentation map after changing the clothing with a red mark 1, and the difference is that the clothing of p and g is different. That is, the human body semantic segmentation map of the person is changed because g whether the piece of clothes is a long sleeve or a collar. So we are generating p _T When it will p _R Is set to the same color as the upper body semantic, thereby allowing the model to generate p _T The division in the arm and neck area more closely conforms to the shape of the sleeves and collar of the garment to be changed.

Based on the above, the semantic generation model provided in this embodiment solves the pixel positioning problem by means of shallow visual information, solves the pixel classification problem by means of deep feature information, and integrates and reprocesses features by using a attention mechanism.

referring to fig. 3, fig. 3 shows a schematic structural diagram of the feature enhancement module. The channel attention block is according to equation 1:

the spatial attention block is according to equation 2:

The feature enhancement module is an important component of the semantic generation model, and can further operate the formula 1 to obtain the formula 3:

step S130, inputting the character image to be changed and the clothing image into a clothing appearance flow generating network to obtain a rough deformed clothing image;

in the prior art, using thin-plate spline-based interpolation algorithms for garment warping can create unnatural and uneven warping of the garment. To achieve more natural garment warping, we have devised a garment appearance flow generation network that passes back-propagation algorithms to estimate dense appearance flows to cope with complex garment appearance changes.

the character image to be changed is subjected to character feature obtaining through the first coding module, and the clothing image is subjected to clothing feature obtaining through the second coding module; fusing the character features and the clothing features into global style vectors; the global style vector passes through the self-attention module to obtain an enhancement vector; the enhancement vector passes through the up-sampling module to obtain a garment appearance stream; and enabling the clothing appearance to flow through the deformation module to obtain the rough deformation clothing image.

Referring to fig. 4, in particular, the garment appearance stream generating network is composed of two convolutional encoders E _p And E is _g Composition, E, extracting character and clothing features respectively _p And E is _g The method consists of 4 layers of downsampled convolution layers, and the character features and the clothing features are fused to obtain a global style vector z, wherein the direct splicing mode adopted by the method comprises the following steps:

s＝[E _p (p),E _g (g)]

to better blend the character features and clothing features, we have adopted our expression of enhancing global style vectors using a self-attention mechanism:

z′＝softmax(MLP(z))·z

then we estimate the garment appearance flow f using a four layer up-sampled convolution layer D, and finally, the garment is deformed coarsely by the up-sampling operator, namely:

f＝D(z′)

g′＝S(g，f)

step S140, a clothing semantic part is obtained according to the human semantic optimization image, and the rough deformed clothing image is corrected according to the clothing semantic part to obtain a fine deformed clothing image;

referring to fig. 5, since the garment appearance flow vector f is generated from the global style vector, the description of the local detail is lacking, thereby generating unnatural warp of the deformed garment. Therefore, we look at p from human semantics _T Obtaining semantic graph g of deformed clothing _M And carrying out local correction on the rough deformed clothing g ', thereby obtaining the refined deformed clothing g', wherein the specific formula is as follows:

g″＝g′⊙g _M

this formula represents multiplying the matrix corresponding to the semantic part of the garment by the elements at the same position on the matrix corresponding to the rough deformed garment image, for example multiplying the elements on the j-th row of the matrix corresponding to the semantic part of the garment by the elements on the j-th row of the matrix corresponding to the rough deformed garment image. Then the newly obtained element on the j-th column of the i-th row of the result matrix is determined. Similarly, the final data matrix is the matrix corresponding to the fine deformation clothing image.

In an embodiment, the human body semantic optimization image includes a face semantic part, a hairstyle semantic part, a neck semantic part, an arm semantic part, and a clothing semantic part, and the obtaining the clothing semantic part according to the human body semantic optimization image includes: and separating the clothing semantic part from the human semantic optimization image.

And step S150, inputting the fine deformation clothing image, the character image to be changed and the human body semantic optimization image into the semantic generation model to obtain a virtual fitting result.

Referring to fig. 6, in order to migrate the garment to the body of the character to be changed efficiently and naturally, we use a U-Net model with residual connection enhancement blocks, i.e. add feature enhancement modules in the jump connection structure of the U-Net model, which generates the garment semantic graph. Because the feature enhancement module can optimize the extracted features and focus on the context information, the generated results more conform to the actual results.

Here we use the character to be changed p and the deformed garment g' as inputs to the model, and we use the human semantic segmentation map p _T As input to the model, without requiring additional refinement operations. Finally, the model can be based on human semantic map p _T The fusion of p and g' is better completed, and the required try-on effect is generated.

It should be noted that the same U-Net model with residual connection enhancement block as step S120 can obtain different output results because: first, for virtual fitting tasks, there is a common data set, and the data set provides a semantic view of the character and a view of the effect of changing the garment after the change. So in the first model, p are input by the U-Net model _M G, generating a predicted human body semantic graph after three data, wherein the label is a human body semantic graph provided by a data set after changing clothes, and training and optimizing the two human body semantic graphs through a loss function; the second U-Net is also the same reason, after data is input, character images after reloading are generated, and labels are real reloaded images provided by a data set, and the two images are trained and optimized through some loss functions. The structure of the two models is the same, except that their inputs and outputs are different, i.e. the tasks are different. Thus, the same model may accomplish different tasks.

The virtual fitting method provided by the embodiment can enable the virtual fitting to be more fit with the human body, enable the human body and the clothes to be perfectly fused, and simultaneously keep the appearance characteristics of the clothes unchanged. After the human body wears the clothes, invariance of semantic information of the human body can be guaranteed, namely, the areas without shielding by clothes such as hands, heads and the like are kept unchanged, and the areas are consistent with the areas before the human body wears the clothes. The practicality and the reliability of virtual fitting are greatly improved.

Example 2

In addition, the embodiment of the disclosure provides a virtual fitting device.

Specifically, as shown in fig. 7, the virtual fitting device 700 includes:

an acquisition module 710, configured to acquire a character image to be changed, a clothing image, and a human semantic rough segmentation image based on the real-time image;

the first generation module 720 is configured to input the image of the character to be changed, the image of the garment, and the human semantic rough segmentation image into a semantic generation model to obtain a human semantic optimization image;

a second generating module 730, configured to input the image of the person to be changed and the image of the garment into a garment appearance stream generating network to obtain a rough deformed garment image;

the correction module 740 is configured to obtain a clothing semantic part according to the human semantic optimization image, and correct the coarse deformation clothing image according to the clothing semantic part to obtain a fine deformation clothing image;

and a third generating module 750, configured to input the fine deformation clothing image, the to-be-changed character image, and the human body semantic optimization image into the semantic generation model, so as to obtain a virtual fitting result.

The virtual fitting device 700 provided in this embodiment can implement the virtual fitting method provided in embodiment 1, and in order to avoid repetition, the description is omitted here.

The virtual fitting device provided by the embodiment can enable the virtual fitting to be more fit with the human body, enable the human body and the clothes to be perfectly fused, and simultaneously keep the appearance characteristics of the clothes unchanged. After the human body wears the clothes, invariance of semantic information of the human body can be guaranteed, namely, the areas without shielding by clothes such as hands, heads and the like are kept unchanged, and the areas are consistent with the areas before the human body wears the clothes. The practicality and the reliability of virtual fitting are greatly improved.

Example 3

Furthermore, an embodiment of the present disclosure provides an electronic device comprising a memory and a processor, the memory storing a computer program that, when run on the processor, performs the virtual fitting method provided by embodiment 1.

The electronic device provided by the embodiment of the present invention may implement the virtual fitting method provided by embodiment 1, and in order to avoid repetition, details are not repeated here.

The electronic equipment provided by the embodiment can enable the virtual fitting to be more fit with the human body, enable the human body and the clothes to be perfectly fused, and keep the appearance characteristics of the clothes unchanged. After the human body wears the clothes, invariance of semantic information of the human body can be guaranteed, namely, the areas without shielding by clothes such as hands, heads and the like are kept unchanged, and the areas are consistent with the areas before the human body wears the clothes. The practicality and the reliability of virtual fitting are greatly improved.

Example 4

The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the virtual fitting method provided by embodiment 1.

In the present embodiment, the computer readable storage medium may be a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, an optical disk, or the like.

The computer readable storage medium provided in this embodiment may implement the virtual fitting method provided in embodiment 1, and in order to avoid repetition, a detailed description is omitted here.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal comprising the element.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), including several instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative, not restrictive, and many forms may be made by those of ordinary skill in the art without departing from the spirit and scope of the present application, which is also within the protection of the present application.

Claims

1. A virtual fitting method, the method comprising:

2. The virtual fitting method according to claim 1, wherein the acquiring the image of the character to be changed based on the real-time image includes:

3. The virtual fitting method according to claim 1, wherein the semantic generation model includes a feature extraction module, a feature enhancement module, and a semantic generation module, the inputting the image of the person to be fitted, the image of the garment, and the human semantic rough segmentation image into the semantic generation model, obtaining a human semantic optimized image, comprises:

4. A virtual fitting method according to claim 3, wherein the feature enhancement module comprises a channel attention block, a spatial attention block and a jump connection structure;

the channel attention block is according to the formula:

the spatial attention block is according to the formula:

5. The virtual fitting method according to claim 1, wherein the garment appearance stream generation network comprises a first encoding module, a second encoding module, a self-attention module, an up-sampling module, and a morphing module, the inputting the garment appearance image and the garment image into the garment appearance stream generation network, resulting in a rough morphed garment image, comprising:

6. The virtual fitting method according to claim 1, wherein the human body semantic optimization image includes a face semantic part, a hairstyle semantic part, a neck semantic part, an arm semantic part, and a clothing semantic part, the obtaining the clothing semantic part from the human body semantic optimization image includes:

7. The virtual fitting method according to claim 1, wherein said correcting said coarse deformation clothing image according to said clothing semantic part results in a fine deformation clothing image, comprising:

8. A virtual fitting device, the device comprising:

9. An electronic device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, performs the virtual fitting method of any of claims 1 to 7.

10. A computer readable storage medium, characterized in that it stores a computer program which, when run on a processor, performs the virtual fitting method according to any of claims 1 to 7.