CN116452291A - Virtual fitting method, virtual fitting device, electronic equipment and storage medium - Google Patents

Virtual fitting method, virtual fitting device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116452291A
CN116452291A CN202310383484.1A CN202310383484A CN116452291A CN 116452291 A CN116452291 A CN 116452291A CN 202310383484 A CN202310383484 A CN 202310383484A CN 116452291 A CN116452291 A CN 116452291A
Authority
CN
China
Prior art keywords
image
semantic
clothing
module
virtual fitting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310383484.1A
Other languages
Chinese (zh)
Inventor
张少林
张超速
石园
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weifu Vision Beijing Technology Co ltd
Shenzhen Wave Kingdom Co ltd
Original Assignee
Weifu Vision Beijing Technology Co ltd
Shenzhen Wave Kingdom Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weifu Vision Beijing Technology Co ltd, Shenzhen Wave Kingdom Co ltd filed Critical Weifu Vision Beijing Technology Co ltd
Priority to CN202310383484.1A priority Critical patent/CN116452291A/en
Publication of CN116452291A publication Critical patent/CN116452291A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0641Shopping interfaces
    • G06Q30/0643Graphical representation of items or shoppers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Business, Economics & Management (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The embodiment of the invention provides a virtual fitting method, a virtual fitting device, electronic equipment and a storage medium, and relates to the technical field of image processing. Comprising the following steps: acquiring a character image to be changed, a clothing image and a human semantic rough segmentation image based on the real-time image; inputting the character image to be changed, the clothing image and the human semantic rough segmentation image into a semantic generation model to obtain a human semantic optimization image; inputting the character image to be changed and the clothing image into a clothing appearance flow generating network to obtain a rough deformed clothing image; obtaining a clothing semantic part according to the human semantic optimization image, and correcting the rough deformed clothing image according to the clothing semantic part to obtain a fine deformed clothing image; inputting the fine deformation clothing image, the character image to be changed and the human body semantic optimization image into the semantic generation model to obtain a virtual fitting result. The practicality and the reliability of virtual fitting are greatly improved.

Description

Virtual fitting method, virtual fitting device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a virtual fitting method, a virtual fitting device, an electronic device, and a storage medium.
Background
With the development of computer technology and internet technology, online shopping has become an increasingly popular shopping option. However, online shopping has problems, for example, if the clothing is purchased online, the clothing cannot be first dressed and then purchased like an online physically store, which results in a clothing pattern that is not fully suitable for the consumer or does not fully meet the consumer's needs.
Based on this, a study of virtual fitting using a computer has been made. The mainstream virtual fitting technique is divided into two types: virtual fitting technology based on three-dimensional reconstruction and virtual fitting technology based on deep learning. However, in the prior art, the problem that the clothes are unnatural and uneven in warping generally exists, so that the clothes cannot be well attached to a human body, and a good fitting result cannot be obtained.
Disclosure of Invention
In order to solve the technical problems, embodiments of the present application provide a virtual fitting method, a virtual fitting device, an electronic device, and a storage medium.
In a first aspect, an embodiment of the present application provides a virtual fitting method, where the method includes:
acquiring a character image to be changed, a clothing image and a human semantic rough segmentation image based on the real-time image;
inputting the character image to be changed, the clothing image and the human semantic rough segmentation image into a semantic generation model to obtain a human semantic optimization image;
inputting the character image to be changed and the clothing image into a clothing appearance flow generating network to obtain a rough deformed clothing image;
obtaining a clothing semantic part according to the human semantic optimization image, and correcting the rough deformed clothing image according to the clothing semantic part to obtain a fine deformed clothing image;
inputting the fine deformation clothing image, the character image to be changed and the human body semantic optimization image into the semantic generation model to obtain a virtual fitting result.
In an embodiment, the acquiring the image of the character to be changed based on the real-time image includes:
acquiring the real-time image, and cutting the real-time image into a preset size to obtain a cut image;
and removing the background of the clipping image to obtain the character image to be changed.
In an embodiment, the semantic generation model includes a feature extraction module, a feature enhancement module, and a semantic generation module, and the inputting the image of the character to be changed, the clothing image, and the human semantic rough segmentation image into the semantic generation model, to obtain a human semantic optimized image, includes:
the character image to be changed, the clothing image and the human semantic rough segmentation image are processed through the feature extraction module to obtain a plurality of initial feature images;
fusing the plurality of initial feature images through the feature enhancement module to obtain an enhanced feature image;
and the enhanced feature map is passed through the semantic generation module to obtain the human body semantic optimization image.
In an embodiment, the feature enhancement module includes a channel attention block, a spatial attention block, and a jump connection structure;
the channel attention block is according to the formula
Extracting a channel attention feature, wherein,representing channel attention features, σ representing Sigmoid functions, avgPool representing average pooling, maxPool representing maximum pooling, MLP representing a multi-layer perceptron, and F representing the input of the feature enhancement module;
the spatial attention block is according to the formula
Extracting a spatial attention feature, wherein f 3×3 Representing a convolution operation with a convolution kernel size of 3 x 3,and->The feature maps obtained by the average pooling and the maximum pooling of the channel attention processed F in the channel direction are shown, respectively.
In an embodiment, the garment appearance flow generating network includes a first encoding module, a second encoding module, a self-attention module, an up-sampling module and a deformation module, and the inputting the image of the person to be changed and the image of the garment into the garment appearance flow generating network, to obtain a rough deformed garment image includes:
the character image to be changed is subjected to character feature obtaining through the first coding module, and the clothing image is subjected to clothing feature obtaining through the second coding module;
fusing the character features and the clothing features into global style vectors;
the global style vector passes through the self-attention module to obtain an enhancement vector;
the enhancement vector passes through the up-sampling module to obtain a garment appearance stream;
and enabling the clothing appearance to flow through the deformation module to obtain the rough deformation clothing image.
In an embodiment, the human body semantic optimization image includes a face semantic part, a hairstyle semantic part, a neck semantic part, an arm semantic part, and a clothing semantic part, and the obtaining the clothing semantic part according to the human body semantic optimization image includes:
and separating the clothing semantic part from the human semantic optimization image.
In one embodiment, the correcting the rough deformed clothing image according to the clothing semantic part to obtain a fine deformed clothing image includes:
and carrying out multiplication operation on the semantic parts of the clothing and elements at the same position of the rough deformation clothing image one by one to obtain the fine deformation clothing image.
In a second aspect, embodiments of the present application provide a virtual fitting device, the device including:
the acquisition module is used for acquiring the character image to be changed, the clothing image and the human semantic rough segmentation image based on the real-time image;
the first generation module is used for inputting the character image to be changed, the clothing image and the human semantic rough segmentation image into a semantic generation model to obtain a human semantic optimization image;
the second generation module is used for inputting the character image to be changed and the clothing image into a clothing appearance flow generation network to obtain a rough deformed clothing image;
the correction module is used for obtaining a clothing semantic part according to the human semantic optimization image, correcting the rough deformed clothing image according to the clothing semantic part and obtaining a fine deformed clothing image;
and the third generation module is used for inputting the fine deformation clothing image, the character image to be changed and the human body semantic optimization image into the semantic generation model to obtain a virtual fitting result.
In a third aspect, an embodiment of the present application provides an electronic device, including a memory and a processor, where the memory is configured to store a computer program, and the computer program executes the virtual fitting method provided in the first aspect when the processor runs.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program which, when run on a processor, performs the virtual fitting method provided in the first aspect.
The virtual fitting method provided by the application can enable the virtual fitting to be more fit with the human body, can enable the human body and the clothes to be perfectly fused, and can keep the appearance characteristics of the clothes unchanged. After the human body wears the clothes, invariance of semantic information of the human body can be guaranteed, namely, the areas without shielding by clothes such as hands, heads and the like are kept unchanged, and the areas are consistent with the areas before the human body wears the clothes. The practicality and the reliability of virtual fitting are greatly improved.
Drawings
In order to more clearly illustrate the technical solutions of the present application, the drawings that are required for the embodiments will be briefly described, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope of protection of the present application. Like elements are numbered alike in the various figures.
Fig. 1 is a schematic flow chart of a virtual fitting method according to an embodiment of the present application;
FIG. 2 shows a schematic flow chart of obtaining a human semantic optimized image according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a feature enhancement module according to an embodiment of the disclosure;
FIG. 4 is a schematic flow chart of a rough deformed garment image according to an embodiment of the present application;
FIG. 5 shows a schematic flow chart of a method for obtaining a fine deformed garment image according to an embodiment of the present application;
fig. 6 is a schematic flow chart of obtaining virtual fitting results according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a virtual fitting device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments.
The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, are intended to be within the scope of the present application.
In the following, the terms "comprises", "comprising", "having" and their cognate terms may be used in various embodiments of the present application are intended only to refer to a particular feature, number, step, operation, element, component, or combination of the foregoing, and should not be interpreted as first excluding the existence of or increasing the likelihood of one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.
Furthermore, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of this application belong. The terms (such as those defined in commonly used dictionaries) will be interpreted as having a meaning that is identical to the meaning of the context in the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in connection with the various embodiments.
Example 1
The embodiment of the disclosure provides a virtual fitting method.
Specifically, referring to fig. 1, the virtual fitting method includes:
step S110, acquiring a character image to be changed, a clothing image and a human body semantic rough segmentation image based on the real-time image;
in an embodiment, the acquiring the image of the character to be changed based on the real-time image includes: acquiring the real-time image, and cutting the real-time image into a preset size to obtain a cut image; and removing the background of the clipping image to obtain the character image to be changed.
Specifically, the human body image may be acquired from a real-time image acquired in real time by the image pickup apparatus. The image of the person to be changed may be acquired in real time using Intel RealSense, and then we need to crop the acquired image to a standard size of 256 x 192, which is to be consistent with the image size of the VTON virtual fit dataset. To prevent the interference of the background on the changing effect, we need to scratch out the person from the background, so we use the person scratch algorithm PaddleSeg of hundred degrees flying paddle. Since the quality of the person image obtained from the actual scene is dark, we perform color contrast enhancement and image sharpening on the person image from the background.
The human body semantic rough segmentation image is obtained by designing and training a U-Net model based on a residual connection enhancement block based on the existing human body segmentation data set and applying the model to a virtual fitting task. Specifically, the human body image is divided into a plurality of parts with fine granularity semantics, such as body parts, clothes and the like, so as to provide more character information for the model and realize more comfortable and ideal clothes changing effect.
Step S120, inputting the character image to be changed, the clothing image and the human body semantic rough segmentation image into a semantic generation model to obtain a human body semantic optimization image;
in an embodiment, the semantic generation model includes a feature extraction module, a feature enhancement module, and a semantic generation module, and the inputting the image of the character to be changed, the clothing image, and the human semantic rough segmentation image into the semantic generation model, to obtain a human semantic optimized image, includes:
the character image to be changed, the clothing image and the human semantic rough segmentation image are processed through the feature extraction module to obtain a plurality of initial feature images; fusing the plurality of initial feature images through the feature enhancement module to obtain an enhanced feature image; and the enhanced feature map is passed through the semantic generation module to obtain the human body semantic optimization image.
The semantic generation model input comprises a character image p to be changed, a clothing image g and a human body semantic rough segmentation image p M . Because the clothing forms are various, some clothing is provided with a collar, and some clothing is long-sleeved or short-sleeved, so that the semantic information of the neck and the arm can be changed after makeup, and the semantic information of the neck and the arm in the semantic segmentation map of the character to be changed is set to be the same semantic as the clothing. Generating human body semantic optimization image p of character after changing clothes through improved UNet model T
Specifically, please refer to fig. 2, p R Is a human semantic segmentation map after p is not changed, p T Is a human semantic segmentation map after changing the clothing with a red mark 1, and the difference is that the clothing of p and g is different. That is, the human body semantic segmentation map of the person is changed because g whether the piece of clothes is a long sleeve or a collar. So we are generating p T When it will p R Is set to the same color as the upper body semantic, thereby allowing the model to generate p T The division in the arm and neck area more closely conforms to the shape of the sleeves and collar of the garment to be changed.
Based on the above, the semantic generation model provided in this embodiment solves the pixel positioning problem by means of shallow visual information, solves the pixel classification problem by means of deep feature information, and integrates and reprocesses features by using a attention mechanism.
In an embodiment, the feature enhancement module includes a channel attention block, a spatial attention block, and a jump connection structure;
referring to fig. 3, fig. 3 shows a schematic structural diagram of the feature enhancement module. The channel attention block is according to equation 1:
extracting a channel attention feature, wherein,representing channel attention features, σ representing Sigmoid functions, avgPool representing average pooling, maxPool representing maximum pooling, MLP representing a multi-layer perceptron, and F representing the input of the feature enhancement module;
the spatial attention block is according to equation 2:
extracting a spatial attention feature, wherein f 3×3 Representing a convolution operation with a convolution kernel size of 3 x 3,and->The feature maps obtained by the average pooling and the maximum pooling of the channel attention processed F in the channel direction are shown, respectively.
The feature enhancement module is an important component of the semantic generation model, and can further operate the formula 1 to obtain the formula 3:
step S130, inputting the character image to be changed and the clothing image into a clothing appearance flow generating network to obtain a rough deformed clothing image;
in the prior art, using thin-plate spline-based interpolation algorithms for garment warping can create unnatural and uneven warping of the garment. To achieve more natural garment warping, we have devised a garment appearance flow generation network that passes back-propagation algorithms to estimate dense appearance flows to cope with complex garment appearance changes.
In an embodiment, the garment appearance flow generating network includes a first encoding module, a second encoding module, a self-attention module, an up-sampling module and a deformation module, and the inputting the image of the person to be changed and the image of the garment into the garment appearance flow generating network, to obtain a rough deformed garment image includes:
the character image to be changed is subjected to character feature obtaining through the first coding module, and the clothing image is subjected to clothing feature obtaining through the second coding module; fusing the character features and the clothing features into global style vectors; the global style vector passes through the self-attention module to obtain an enhancement vector; the enhancement vector passes through the up-sampling module to obtain a garment appearance stream; and enabling the clothing appearance to flow through the deformation module to obtain the rough deformation clothing image.
Referring to fig. 4, in particular, the garment appearance stream generating network is composed of two convolutional encoders E p And E is g Composition, E, extracting character and clothing features respectively p And E is g The method consists of 4 layers of downsampled convolution layers, and the character features and the clothing features are fused to obtain a global style vector z, wherein the direct splicing mode adopted by the method comprises the following steps:
s=[E p (p),E g (g)]
to better blend the character features and clothing features, we have adopted our expression of enhancing global style vectors using a self-attention mechanism:
z′=softmax(MLP(z))·z
then we estimate the garment appearance flow f using a four layer up-sampled convolution layer D, and finally, the garment is deformed coarsely by the up-sampling operator, namely:
f=D(z′)
g′=S(g,f)
step S140, a clothing semantic part is obtained according to the human semantic optimization image, and the rough deformed clothing image is corrected according to the clothing semantic part to obtain a fine deformed clothing image;
referring to fig. 5, since the garment appearance flow vector f is generated from the global style vector, the description of the local detail is lacking, thereby generating unnatural warp of the deformed garment. Therefore, we look at p from human semantics T Obtaining semantic graph g of deformed clothing M And carrying out local correction on the rough deformed clothing g ', thereby obtaining the refined deformed clothing g', wherein the specific formula is as follows:
g″=g′⊙g M
this formula represents multiplying the matrix corresponding to the semantic part of the garment by the elements at the same position on the matrix corresponding to the rough deformed garment image, for example multiplying the elements on the j-th row of the matrix corresponding to the semantic part of the garment by the elements on the j-th row of the matrix corresponding to the rough deformed garment image. Then the newly obtained element on the j-th column of the i-th row of the result matrix is determined. Similarly, the final data matrix is the matrix corresponding to the fine deformation clothing image.
In an embodiment, the human body semantic optimization image includes a face semantic part, a hairstyle semantic part, a neck semantic part, an arm semantic part, and a clothing semantic part, and the obtaining the clothing semantic part according to the human body semantic optimization image includes: and separating the clothing semantic part from the human semantic optimization image.
In one embodiment, the correcting the rough deformed clothing image according to the clothing semantic part to obtain a fine deformed clothing image includes:
and carrying out multiplication operation on the semantic parts of the clothing and elements at the same position of the rough deformation clothing image one by one to obtain the fine deformation clothing image.
And step S150, inputting the fine deformation clothing image, the character image to be changed and the human body semantic optimization image into the semantic generation model to obtain a virtual fitting result.
Referring to fig. 6, in order to migrate the garment to the body of the character to be changed efficiently and naturally, we use a U-Net model with residual connection enhancement blocks, i.e. add feature enhancement modules in the jump connection structure of the U-Net model, which generates the garment semantic graph. Because the feature enhancement module can optimize the extracted features and focus on the context information, the generated results more conform to the actual results.
Here we use the character to be changed p and the deformed garment g' as inputs to the model, and we use the human semantic segmentation map p T As input to the model, without requiring additional refinement operations. Finally, the model can be based on human semantic map p T The fusion of p and g' is better completed, and the required try-on effect is generated.
It should be noted that the same U-Net model with residual connection enhancement block as step S120 can obtain different output results because: first, for virtual fitting tasks, there is a common data set, and the data set provides a semantic view of the character and a view of the effect of changing the garment after the change. So in the first model, p are input by the U-Net model M G, generating a predicted human body semantic graph after three data, wherein the label is a human body semantic graph provided by a data set after changing clothes, and training and optimizing the two human body semantic graphs through a loss function; the second U-Net is also the same reason, after data is input, character images after reloading are generated, and labels are real reloaded images provided by a data set, and the two images are trained and optimized through some loss functions. The structure of the two models is the same, except that their inputs and outputs are different, i.e. the tasks are different. Thus, the same model may accomplish different tasks.
The virtual fitting method provided by the embodiment can enable the virtual fitting to be more fit with the human body, enable the human body and the clothes to be perfectly fused, and simultaneously keep the appearance characteristics of the clothes unchanged. After the human body wears the clothes, invariance of semantic information of the human body can be guaranteed, namely, the areas without shielding by clothes such as hands, heads and the like are kept unchanged, and the areas are consistent with the areas before the human body wears the clothes. The practicality and the reliability of virtual fitting are greatly improved.
Example 2
In addition, the embodiment of the disclosure provides a virtual fitting device.
Specifically, as shown in fig. 7, the virtual fitting device 700 includes:
an acquisition module 710, configured to acquire a character image to be changed, a clothing image, and a human semantic rough segmentation image based on the real-time image;
the first generation module 720 is configured to input the image of the character to be changed, the image of the garment, and the human semantic rough segmentation image into a semantic generation model to obtain a human semantic optimization image;
a second generating module 730, configured to input the image of the person to be changed and the image of the garment into a garment appearance stream generating network to obtain a rough deformed garment image;
the correction module 740 is configured to obtain a clothing semantic part according to the human semantic optimization image, and correct the coarse deformation clothing image according to the clothing semantic part to obtain a fine deformation clothing image;
and a third generating module 750, configured to input the fine deformation clothing image, the to-be-changed character image, and the human body semantic optimization image into the semantic generation model, so as to obtain a virtual fitting result.
The virtual fitting device 700 provided in this embodiment can implement the virtual fitting method provided in embodiment 1, and in order to avoid repetition, the description is omitted here.
The virtual fitting device provided by the embodiment can enable the virtual fitting to be more fit with the human body, enable the human body and the clothes to be perfectly fused, and simultaneously keep the appearance characteristics of the clothes unchanged. After the human body wears the clothes, invariance of semantic information of the human body can be guaranteed, namely, the areas without shielding by clothes such as hands, heads and the like are kept unchanged, and the areas are consistent with the areas before the human body wears the clothes. The practicality and the reliability of virtual fitting are greatly improved.
Example 3
Furthermore, an embodiment of the present disclosure provides an electronic device comprising a memory and a processor, the memory storing a computer program that, when run on the processor, performs the virtual fitting method provided by embodiment 1.
The electronic device provided by the embodiment of the present invention may implement the virtual fitting method provided by embodiment 1, and in order to avoid repetition, details are not repeated here.
The electronic equipment provided by the embodiment can enable the virtual fitting to be more fit with the human body, enable the human body and the clothes to be perfectly fused, and keep the appearance characteristics of the clothes unchanged. After the human body wears the clothes, invariance of semantic information of the human body can be guaranteed, namely, the areas without shielding by clothes such as hands, heads and the like are kept unchanged, and the areas are consistent with the areas before the human body wears the clothes. The practicality and the reliability of virtual fitting are greatly improved.
Example 4
The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the virtual fitting method provided by embodiment 1.
In the present embodiment, the computer readable storage medium may be a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, an optical disk, or the like.
The computer readable storage medium provided in this embodiment may implement the virtual fitting method provided in embodiment 1, and in order to avoid repetition, a detailed description is omitted here.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal comprising the element.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), including several instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.
The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative, not restrictive, and many forms may be made by those of ordinary skill in the art without departing from the spirit and scope of the present application, which is also within the protection of the present application.

Claims (10)

1. A virtual fitting method, the method comprising:
acquiring a character image to be changed, a clothing image and a human semantic rough segmentation image based on the real-time image;
inputting the character image to be changed, the clothing image and the human semantic rough segmentation image into a semantic generation model to obtain a human semantic optimization image;
inputting the character image to be changed and the clothing image into a clothing appearance flow generating network to obtain a rough deformed clothing image;
obtaining a clothing semantic part according to the human semantic optimization image, and correcting the rough deformed clothing image according to the clothing semantic part to obtain a fine deformed clothing image;
inputting the fine deformation clothing image, the character image to be changed and the human body semantic optimization image into the semantic generation model to obtain a virtual fitting result.
2. The virtual fitting method according to claim 1, wherein the acquiring the image of the character to be changed based on the real-time image includes:
acquiring the real-time image, and cutting the real-time image into a preset size to obtain a cut image;
and removing the background of the clipping image to obtain the character image to be changed.
3. The virtual fitting method according to claim 1, wherein the semantic generation model includes a feature extraction module, a feature enhancement module, and a semantic generation module, the inputting the image of the person to be fitted, the image of the garment, and the human semantic rough segmentation image into the semantic generation model, obtaining a human semantic optimized image, comprises:
the character image to be changed, the clothing image and the human semantic rough segmentation image are processed through the feature extraction module to obtain a plurality of initial feature images;
fusing the plurality of initial feature images through the feature enhancement module to obtain an enhanced feature image;
and the enhanced feature map is passed through the semantic generation module to obtain the human body semantic optimization image.
4. A virtual fitting method according to claim 3, wherein the feature enhancement module comprises a channel attention block, a spatial attention block and a jump connection structure;
the channel attention block is according to the formula:
extracting a channel attention feature, wherein,representing channel attention features, σ representing Sigmoid functions, avgPool representing average pooling, maxPool representing maximum pooling, MLP representing a multi-layer perceptron, and F representing the input of the feature enhancement module;
the spatial attention block is according to the formula:
extracting a spatial attention feature, wherein f 3×3 Representing a convolution operation with a convolution kernel size of 3 x 3,and->The feature maps obtained by the average pooling and the maximum pooling of the channel attention processed F in the channel direction are shown, respectively.
5. The virtual fitting method according to claim 1, wherein the garment appearance stream generation network comprises a first encoding module, a second encoding module, a self-attention module, an up-sampling module, and a morphing module, the inputting the garment appearance image and the garment image into the garment appearance stream generation network, resulting in a rough morphed garment image, comprising:
the character image to be changed is subjected to character feature obtaining through the first coding module, and the clothing image is subjected to clothing feature obtaining through the second coding module;
fusing the character features and the clothing features into global style vectors;
the global style vector passes through the self-attention module to obtain an enhancement vector;
the enhancement vector passes through the up-sampling module to obtain a garment appearance stream;
and enabling the clothing appearance to flow through the deformation module to obtain the rough deformation clothing image.
6. The virtual fitting method according to claim 1, wherein the human body semantic optimization image includes a face semantic part, a hairstyle semantic part, a neck semantic part, an arm semantic part, and a clothing semantic part, the obtaining the clothing semantic part from the human body semantic optimization image includes:
and separating the clothing semantic part from the human semantic optimization image.
7. The virtual fitting method according to claim 1, wherein said correcting said coarse deformation clothing image according to said clothing semantic part results in a fine deformation clothing image, comprising:
and carrying out multiplication operation on the semantic parts of the clothing and elements at the same position of the rough deformation clothing image one by one to obtain the fine deformation clothing image.
8. A virtual fitting device, the device comprising:
the acquisition module is used for acquiring the character image to be changed, the clothing image and the human semantic rough segmentation image based on the real-time image;
the first generation module is used for inputting the character image to be changed, the clothing image and the human semantic rough segmentation image into a semantic generation model to obtain a human semantic optimization image;
the second generation module is used for inputting the character image to be changed and the clothing image into a clothing appearance flow generation network to obtain a rough deformed clothing image;
the correction module is used for obtaining a clothing semantic part according to the human semantic optimization image, correcting the rough deformed clothing image according to the clothing semantic part and obtaining a fine deformed clothing image;
and the third generation module is used for inputting the fine deformation clothing image, the character image to be changed and the human body semantic optimization image into the semantic generation model to obtain a virtual fitting result.
9. An electronic device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, performs the virtual fitting method of any of claims 1 to 7.
10. A computer readable storage medium, characterized in that it stores a computer program which, when run on a processor, performs the virtual fitting method according to any of claims 1 to 7.
CN202310383484.1A 2023-04-06 2023-04-06 Virtual fitting method, virtual fitting device, electronic equipment and storage medium Pending CN116452291A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310383484.1A CN116452291A (en) 2023-04-06 2023-04-06 Virtual fitting method, virtual fitting device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310383484.1A CN116452291A (en) 2023-04-06 2023-04-06 Virtual fitting method, virtual fitting device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116452291A true CN116452291A (en) 2023-07-18

Family

ID=87121465

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310383484.1A Pending CN116452291A (en) 2023-04-06 2023-04-06 Virtual fitting method, virtual fitting device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116452291A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117057976A (en) * 2023-08-04 2023-11-14 南通大学 Virtual fitting method based on local appearance flow

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117057976A (en) * 2023-08-04 2023-11-14 南通大学 Virtual fitting method based on local appearance flow
CN117057976B (en) * 2023-08-04 2024-03-19 南通大学 Virtual fitting method based on local appearance flow

Similar Documents

Publication Publication Date Title
US11455495B2 (en) System and method for visual recognition using synthetic training data
CN110660037B (en) Method, apparatus, system and computer program product for face exchange between images
Rematas et al. Novel views of objects from a single image
Yang et al. Detailed garment recovery from a single-view image
Tewari et al. Learning complete 3d morphable face models from images and videos
Xu et al. Learning from multi-domain artistic images for arbitrary style transfer
Liu et al. Structure-guided arbitrary style transfer for artistic image and video
Singh et al. Neural style transfer: A critical review
CN111127309B (en) Portrait style migration model training method, portrait style migration method and device
Liu et al. Humangaussian: Text-driven 3d human generation with gaussian splatting
US20160086365A1 (en) Systems and methods for the conversion of images into personalized animations
CN116452291A (en) Virtual fitting method, virtual fitting device, electronic equipment and storage medium
CN115439308A (en) Method for training fitting model, virtual fitting method and related device
US20210073323A1 (en) Font capture from images of target decorative character glyphs
Kim et al. Game effect sprite generation with minimal data via conditional GAN
CN116168186A (en) Virtual fitting chart generation method with controllable garment length
CN115761143B (en) 3D virtual reloading model generation method and device based on 2D image
Inoue et al. Learning to trace: Expressive line drawing generation from photographs
Fischer et al. Imaginenet: restyling apps using neural style transfer
CN114926324A (en) Virtual fitting model training method based on real character image, virtual fitting method, device and equipment
CN114445427A (en) Image processing method, image processing device, electronic equipment and storage medium
Yuan et al. Magic glasses: from 2D to 3D
Wang et al. HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion Models
US20240119681A1 (en) Systems and methods for using machine learning models to effect virtual try-on and styling on actual users
Kang et al. Eyeglass Remover Network based on a Synthetic Image Dataset.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination