CN114926324A - Virtual fitting model training method based on real character image, virtual fitting method, device and equipment - Google Patents
Virtual fitting model training method based on real character image, virtual fitting method, device and equipment Download PDFInfo
- Publication number
- CN114926324A CN114926324A CN202210593210.0A CN202210593210A CN114926324A CN 114926324 A CN114926324 A CN 114926324A CN 202210593210 A CN202210593210 A CN 202210593210A CN 114926324 A CN114926324 A CN 114926324A
- Authority
- CN
- China
- Prior art keywords
- human body
- image
- estimation
- virtual fitting
- clothes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 85
- 238000000034 method Methods 0.000 title claims abstract description 59
- 238000000605 extraction Methods 0.000 claims abstract description 57
- 238000013528 artificial neural network Methods 0.000 claims abstract description 29
- 238000012545 processing Methods 0.000 claims abstract description 14
- 230000011218 segmentation Effects 0.000 claims abstract description 13
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 10
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 10
- 230000003042 antagnostic effect Effects 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 16
- 230000004927 fusion Effects 0.000 claims description 16
- 238000010586 diagram Methods 0.000 claims description 13
- 230000006870 function Effects 0.000 claims description 12
- 238000001514 detection method Methods 0.000 claims description 9
- 238000002372 labelling Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 7
- 238000009499 grossing Methods 0.000 description 5
- 238000003709 image segmentation Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/04—Context-preserving transformations, e.g. by using an importance map
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the invention discloses a virtual fitting model training method based on a real character image, a virtual fitting method, a virtual fitting device and equipment. The invention relates to the technical field of virtual fitting, which comprises the following steps: training a virtual fitting model based on a two-dimensional original clothes image and a two-dimensional real character image, wherein the virtual fitting model comprises a characteristic pyramid network, an appearance flow estimation network and a generated confrontation neural network, and acquiring a two-dimensional real character image and a two-dimensional real clothes image selected by a user after the model is trained; respectively carrying out attitude estimation and image semantic segmentation on the two-dimensional real person image to obtain human body attitude estimation and a human body dressing area; inputting the human body posture estimation, the human body dressing area and the two-dimensional real clothes image into the trained virtual fitting model, and generating a virtual fitting image after feature extraction, appearance flow estimation and synthesis processing in sequence. The embodiment of the invention not only can reduce the fitting cost, but also has real and natural fitting effect.
Description
Technical Field
The invention relates to the technical field of virtual fitting, in particular to a virtual fitting model training method, a virtual fitting device and virtual fitting equipment based on real character images.
Background
With the development of computer technology and online shopping platforms, virtual fitting is widely concerned, and the existing virtual fitting is mainly divided into two modes of 2D and 3D, wherein the 3D virtual fitting generally needs to construct a vivid 3D model in order to meet fitting effects, however, each piece of clothes needs to construct a clothes model, and different customer appearances also need to adapt to a human body 3D model, so that the development and maintenance costs are high; the 2D virtual fitting needs to rely on a 3D-simulated user image, and the clothes chartlet processing is carried out on the basis, so that a method of firstly generating a three-dimensional model and then forming a two-dimensional image needs excessive offline processing and is limited by hardware, and the fitting effect is not real and unnatural.
Disclosure of Invention
The embodiment of the invention provides a virtual fitting model training method, a virtual fitting device and virtual fitting equipment based on real character images, and aims to solve the problems that the existing virtual fitting is high in cost and unreal and unnatural in fitting effect.
In a first aspect, an embodiment of the present invention provides a virtual fitting model training method based on real character images, which includes:
acquiring original training data, wherein the original training data comprises a two-dimensional original clothes image and a two-dimensional real person image of a corresponding clothes;
carrying out posture estimation and marking on the two-dimensional real person image to obtain human body posture estimation and a human body dressing area;
inputting the human body posture estimation, the human body dressing area and the two-dimensional original clothes image into a preset feature extraction network for feature extraction and fusion to obtain human body posture features, human body semantic features and clothes features;
inputting the two-dimensional original clothes image, the human body posture characteristic, the human body semantic characteristic and the clothes characteristic into an appearance flow estimation network to obtain an appearance flow estimation graph;
inputting the appearance flow estimation graph, the human body posture estimation and the human body dressing area to generate a confrontation neural network so as to obtain a virtual fitting image;
and performing iterative training on the preset feature extraction network, the appearance flow estimation network and the generated antagonistic neural network according to the virtual fitting image and the two-dimensional real character image to generate a virtual fitting model.
In a second aspect, an embodiment of the present invention provides a virtual fitting method, which includes:
acquiring a two-dimensional real figure image and a two-dimensional real clothes image selected by a user;
respectively carrying out attitude estimation and image semantic segmentation on the two-dimensional real person image to obtain human body attitude estimation and a human body dressing area;
inputting the human body posture estimation, the human body dressing area and the two-dimensional real clothes image into the virtual fitting model of the first aspect, and generating a virtual fitting image after feature extraction, appearance flow estimation and synthesis processing in sequence.
In a third aspect, an embodiment of the present invention further provides a virtual fitting model training apparatus based on a real character image, including:
a first acquisition unit configured to acquire original training data including a two-dimensional original clothes image and a two-dimensional real person image wearing a corresponding clothes;
the estimation marking unit is used for carrying out posture estimation and marking on the two-dimensional real character image to obtain human body posture estimation and a human body dressing area;
the feature extraction and fusion unit is used for inputting the human body posture estimation, the human body dressing area and the two-dimensional original clothes image into a preset feature extraction network for feature extraction and fusion to obtain human body posture features, human body semantic features and clothes features;
an appearance flow estimation unit, configured to input the two-dimensional original clothes image, the human body posture feature, the human body semantic feature, and the clothes feature into an appearance flow estimation network to obtain an appearance flow estimation map;
the first generation unit is used for inputting the appearance flow estimation graph, the human body posture estimation graph and the human body dressing area to generate an antagonistic neural network so as to obtain a virtual fitting image;
and the training unit is used for performing iterative training on the preset feature extraction network, the appearance flow estimation network and the generated antagonistic neural network according to the virtual fitting image and the two-dimensional real character image so as to generate a virtual fitting model.
In a fourth aspect, an embodiment of the present invention further provides a virtual fitting apparatus, which includes:
a second acquisition unit for acquiring a two-dimensional real figure image and a two-dimensional real clothes image selected by a user;
the estimation and segmentation unit is used for respectively carrying out attitude estimation and image semantic segmentation on the two-dimensional real character image so as to obtain human body attitude estimation and a human body dressing area;
a second generating unit, configured to input the human body posture estimation, the human body dressing region, and the two-dimensional real garment image into the virtual fitting model according to any one of claims 1 to 5, and generate a virtual fitting image after feature extraction, appearance flow estimation, and synthesis processing in sequence.
In a fifth aspect, an embodiment of the present invention further provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the methods of the first and second aspects when executing the computer program.
The embodiment of the invention provides a virtual fitting model training method based on a real character image, a virtual fitting method, a virtual fitting device and equipment. Wherein the method comprises the following steps: firstly, training a virtual fitting model based on a two-dimensional original clothes image and a two-dimensional real character image, wherein the virtual fitting model comprises a characteristic pyramid network, an appearance flow estimation network and a generated confrontation neural network, and acquiring a two-dimensional real character image and a two-dimensional real clothes image selected by a user after the model is trained; respectively carrying out attitude estimation and image semantic segmentation on the two-dimensional real person image to obtain human body attitude estimation and a human body dressing area; inputting the human body posture estimation, the human body dressing area and the two-dimensional real clothes image into a trained virtual fitting model, and generating a virtual fitting image after feature extraction, appearance flow estimation and synthesis processing in sequence. According to the technical scheme of the embodiment of the invention, in the model training stage, a virtual fitting model is trained on the basis of a two-dimensional original clothes image and a two-dimensional real character image, wherein the virtual fitting model comprises a characteristic pyramid network, an appearance flow estimation network and a generation countermeasure neural network; and in the model application stage, based on the acquired two-dimensional real character image and the two-dimensional real clothes image, the virtual fitting image is generated in the trained virtual fitting model, only the two-dimensional image is generated in the whole virtual fitting process, and a three-dimensional model is not required to be generated and used, so that the fitting cost can be reduced, and the fitting effect is real and natural.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flowchart of a virtual fitting model training method based on real character images according to an embodiment of the present invention;
fig. 2 is a diagram illustrating an effect of a virtual fitting model training method based on an image of a real person according to an embodiment of the present invention;
fig. 3 is a schematic flow chart of a virtual fitting method according to an embodiment of the present invention;
FIG. 4 is a schematic block diagram of a virtual fitting model training apparatus based on real character images according to an embodiment of the present invention;
fig. 5 is a schematic block diagram of a virtual fitting apparatus according to an embodiment of the present invention;
fig. 6 is a schematic block diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
Referring to fig. 1, fig. 1 is a schematic flow chart of a virtual fitting model training method based on real character images according to an embodiment of the present invention. The virtual fitting model training method based on the real character image of the embodiment of the invention can be applied to a server, and can be realized by a software program configured on the server. The following describes the method for training the virtual fitting model based on the real character image in detail. As shown in fig. 1, the method comprises the following steps S100-S150.
S100, acquiring original training data, wherein the original training data comprises two-dimensional original clothes images and two-dimensional real person images wearing corresponding clothes.
In the embodiment of the invention, the virtual fitting model is trained, firstly, an original training data set required for training the virtual fitting model is required to be obtained, and the original training data comprises a two-dimensional original clothes image and a two-dimensional real person image wearing corresponding clothes. Understandably, the two-dimensional original clothes image includes various clothes images, such as a long-sleeve image, a short-sleeve image, a no-sleeve image, and the like, and the long-sleeve image, the short-sleeve image, and the no-sleeve image respectively include a long-style image, a short-style image, a medium-style image, and the like; the two-dimensional real character image includes an image of a model wearing various kinds of clothes.
And S110, carrying out posture estimation and marking on the two-dimensional real character image to obtain human body posture estimation and a human body dressing area.
In the embodiment of the invention, after a plurality of two-dimensional real person images are acquired, posture estimation is carried out on the two-dimensional real person images through a posture detection model to obtain human body posture estimation, wherein the posture detection model is an existing posture detection model, such as a Pr-VIPE posture detection model or an SOTA posture detection model; and marking the hair, the face and the lower body clothes area in the two-dimensional real person image to obtain a human body dressing area, namely marking the dressing area. It should be noted that, in the embodiment of the present invention, the human posture estimation includes data of human key points such as a nose, a left eye, a right eye, a left ear, a right ear, a left shoulder, and a right shoulder.
S120, inputting the human body posture estimation, the human body dressing area and the two-dimensional original clothes image into a preset feature extraction network for feature extraction and fusion to obtain human body posture features, human body semantic features and clothes features.
In the embodiment of the invention, the human body posture estimation, the human body dressing area and the two-dimensional original clothes image are input into a preset feature extraction network pyramid network for feature extraction to obtain a plurality of posture features, a plurality of human body features and a plurality of clothes features, wherein the preset feature extraction network is a feature pyramid network, the feature pyramid network is a five-layer recursive pyramid network, and the calculation is carried out between adjacent network layers through convolution with the step length of 2; and respectively fusing the plurality of posture features, the plurality of human body features and the plurality of clothes features to obtain human body posture features, human body semantic features and clothes features. It should be noted that, in the embodiment of the present invention, the feature pyramid network further includes 2 residual error modules. It should be noted that, in the embodiment of the present invention, the feature pyramid network is used to extract the features, because the feature pyramid network extracts the features more accurately.
S130, inputting the two-dimensional original clothes image, the human body posture characteristic, the human body semantic characteristic and the clothes characteristic into an appearance flow estimation network to obtain an appearance flow estimation graph.
In the embodiment of the invention, an appearance flow estimation network firstly estimates an appearance flow according to the human body posture characteristic, the human body semantic characteristic and the clothes characteristic; and then, carrying out deformation on the two-dimensional original clothes image according to the appearance flow to obtain an appearance flow estimation diagram. Understandably, in the process of estimating the two-dimensional original clothes image, in order to avoid the jump fault of adjacent pixel points in the two-dimensional original clothes image, namely to better retain the characteristics of clothes, a smooth constraint is introduced to limit the colinearity of appearance estimation. It should be noted that, in the embodiment of the present invention, a second-order smoothing constraint is adopted to limit the collinearity of the appearance estimation. Understandably, in other embodiments, other orders of smoothing constraints may be employed to limit the co-linearity of the appearance estimates, e.g., a first order smoothing constraint. It should be further noted that, in the embodiment of the present invention, the appearance flow estimation network includes 5 flow network modules, and each flow network module includes 4 convolutions.
S140, inputting the appearance flow estimation graph, the human body posture estimation graph and the human body dressing area to generate an antagonistic neural network so as to obtain a virtual fitting image.
In the embodiment of the present invention, an antagonistic neural network is generated by inputting the appearance flow estimation diagram, the body posture estimation and the body dressing area to obtain a virtual fitting image, wherein the generated antagonistic neural network is pix2pix, which includes a generator and a discriminator, and the virtual fitting image is generated by the generator and the discriminator.
S150, performing iterative training on the preset feature extraction network, the appearance flow estimation network and the generated antagonistic neural network according to the virtual fitting image and the two-dimensional real character image to generate a virtual fitting model.
In the embodiment of the invention, the constructed virtual fitting model comprises a characteristic pyramid network, an appearance flow estimation network and a generation countermeasure neural network; after the virtual fitting image is generated, calculating a loss value through a preset loss function according to the virtual fitting image and the two-dimensional real person image; and if the loss value is not less than the preset loss value and the training times are less than the preset training times, performing iterative training on the model. Understandably, if the loss value is smaller than the preset loss value or the training times are not smaller than the preset training times, the trained model is used as a virtual fitting model. It should be noted that, in the embodiment of the present invention, the loss value is calculated according to the loss value in the smoothing process, the visual similarity between the try-on image and the reference image, and the logarithm value of the element-by-element difference value, that is, the loss function is composed of a smoothing second-order loss function, a visual similarity loss function, and an element-by-element loss function, where the element-by-element loss function may include functions such as an activation function, an absolute value, and a square root.
It should be further noted that, in the model training process, as shown in fig. 2, a first graph from left to right in fig. 2 is a two-dimensional real character image, a third graph is a two-dimensional original clothes image, the first graph and the third graph form original training data, the first graph is processed to obtain a second graph and a fourth graph, the second graph is a human body dressing area, the fourth graph is a graph formed by a human body posture estimation and a human body dressing area, and the fifth graph is an appearance flow estimation graph, that is, a graph generated after passing through the feature pyramid network, the appearance flow estimation network, and the generation of the antagonistic neural network.
Referring to fig. 3, fig. 3 is a schematic flow chart of a virtual fitting method according to an embodiment of the present invention. The virtual fitting method provided by the embodiment of the invention can be applied to the terminal, for example, the virtual fitting method can be realized through a software program which is configured on the terminal and named as virtual fitting software, so that the fitting cost can be reduced, and the fitting effect is real and natural. The virtual fitting method is explained in detail below. As shown in fig. 3, the method comprises the following steps S200-S220.
S200, acquiring a two-dimensional real figure image and a two-dimensional real clothes image selected by a user;
s210, respectively carrying out attitude estimation and image semantic segmentation on the two-dimensional real person image to obtain human body attitude estimation and a human body dressing area;
s220, inputting the human body posture estimation, the human body dressing area and the two-dimensional real clothes image into a virtual fitting model, and generating a virtual fitting image after feature extraction, appearance flow estimation and synthesis processing in sequence.
In the embodiment of the invention, before virtual fitting, a user needs to select a two-dimensional real character image and a two-dimensional real clothes image, and in practical application, virtual fitting software displays a clothes selection interface comprising at least one clothes picture to the user so that the user can select clothes of the user's mind as the two-dimensional real clothes image; after the user selects the two-dimensional real clothes image, namely after the virtual fitting software receives a trigger instruction of selecting the two-dimensional real clothes image by the user, displaying photographing guide information to the user so that the user can photograph according to the photographing guide information, and after photographing is finished, displaying a photo selection interface comprising at least one user picture to the user by the virtual fitting software so that the user can select a proper user picture as a two-dimensional real character image; after the two-dimensional real character image and the two-dimensional real clothes image are confirmed, virtual fitting software acquires the two-dimensional real character image and the two-dimensional real clothes image, and carries out posture estimation on the two-dimensional real character image through a posture estimation model to obtain human body posture estimation, wherein the posture estimation model is an existing posture estimation model, such as an openpos posture estimation model; and after the human body posture estimation is obtained, performing image semantic segmentation on the two-dimensional real person image through a human body image segmentation model to obtain a human body dressing area, wherein the human body image segmentation model is an existing human body image segmentation model. For example, the body pix human image segmentation model. After obtaining the human body posture data and the human body dressing area, inputting the human body posture estimation, the human body dressing area and the two-dimensional real clothes image into a virtual fitting model, and generating a virtual fitting image after feature extraction, appearance flow estimation and synthesis processing in sequence. Understandably, after the virtual fitting image is generated, the virtual fitting image is displayed for the user to view.
Fig. 4 is a schematic block diagram of a virtual fitting model training apparatus 200 based on real character images according to an embodiment of the present invention. As shown in fig. 4, the present invention also provides a virtual fitting model training apparatus 200 based on real character images, corresponding to the above virtual fitting model training method based on real character images. The real person image-based virtual fitting model training apparatus 200, which includes means for executing the above-described real person image-based virtual fitting model training method, may be configured in a server. Specifically, referring to fig. 4, the virtual fitting model training apparatus 200 based on the real person image includes a first obtaining unit 201, an estimation labeling unit 202, a feature extraction and fusion unit 203, an appearance flow estimation unit 204, a first generating unit 205, and a training unit 206.
The first acquiring unit 201 is configured to acquire original training data, where the original training data includes a two-dimensional original clothes image and a two-dimensional real person image wearing corresponding clothes; the estimation marking unit 202 is configured to perform pose estimation and marking on the two-dimensional real character image to obtain a human pose estimation and a human dressing area; the feature extraction and fusion unit 203 is configured to input the human body posture estimation, the human body dressing region, and the two-dimensional original clothes image into a preset feature extraction network for feature extraction and fusion to obtain a human body posture feature, a human body semantic feature, and a clothes feature; the appearance flow estimation unit 204 is configured to input the two-dimensional original clothes image, the human body posture feature, the human body semantic feature and the clothes feature into an appearance flow estimation network to obtain an appearance flow estimation map; the first generating unit 205 is configured to generate an antagonistic neural network from the appearance flow estimation map, the body posture estimation and the body dressing area input to obtain a virtual fitting image; the training unit 206 is configured to perform iterative training on the preset feature extraction network, the appearance flow estimation network, and the generated antagonistic neural network according to the virtual fitting image and the two-dimensional real character image to generate a virtual fitting model.
In some embodiments, such as this embodiment, the estimation marking unit includes a posture estimation unit and a marking unit.
The posture estimation unit is used for carrying out posture estimation on the two-dimensional real character image through a posture detection model to obtain human body posture estimation; the marking unit is used for marking the hair, the face and the lower body clothes area in the two-dimensional real character image to obtain a human body clothes wearing area.
In some embodiments, such as the present embodiment, the feature extraction and fusion unit 203 includes an extraction unit and a fusion unit.
The extraction unit is used for inputting the human body posture estimation, the human body dressing area and the two-dimensional original clothes image into a feature pyramid network for feature extraction to obtain a plurality of posture features, a plurality of human body features and a plurality of clothes features; the fusion unit is used for respectively fusing the plurality of posture features, the plurality of human body features and the plurality of clothes features to obtain human body posture features, human body semantic features and clothes features.
In some embodiments, such as this embodiment, the training unit 206 includes a computation unit, a training subunit, and a generation subunit.
The computing unit is used for computing a loss value according to the virtual fitting image and the two-dimensional real person image through a preset loss function; the training subunit is configured to perform iterative training on the preset feature extraction network, the appearance flow estimation network, and the generated countermeasure neural network if the loss value is not less than a preset loss value and the training frequency is less than a preset training frequency; the generation subunit is configured to, if the loss value is smaller than the preset loss value or the training frequency is not smaller than a preset training frequency, use the trained preset feature extraction network, the appearance flow estimation network, and the generated confrontation neural network as a virtual fitting model.
Fig. 5 is a schematic block diagram of a virtual fitting apparatus 300 according to an embodiment of the present invention. As shown in fig. 5, the present invention also provides a virtual fitting apparatus 300 corresponding to the above virtual fitting method. The virtual fitting apparatus 300 includes a unit for performing the above-described virtual fitting method, and the apparatus may be configured in a terminal. Specifically, referring to fig. 5, the virtual fitting apparatus 300 includes a second obtaining unit 301, an estimating and dividing unit 302, and a second generating unit 303.
The second obtaining unit 301 is configured to obtain a two-dimensional real person image and a two-dimensional real clothes image selected by a user; the estimation segmentation unit 302 is configured to perform pose estimation and image semantic segmentation on the two-dimensional real character image to obtain a human pose estimation and a human dressing area; the second generating unit 303 is configured to input the human body posture estimation, the human body dressing region, and the two-dimensional real clothes image into a virtual fitting model, and generate a virtual fitting image after feature extraction, appearance flow estimation, and synthesis processing in sequence.
In some embodiments, for example, in the present embodiment, the virtual fitting apparatus 300 further includes a first display unit, a second display unit, a third display unit, and as units.
The first display unit is used for displaying a clothes selection interface comprising at least one clothes picture to a user; the second display unit is used for displaying photographing guide information to a user after receiving a trigger instruction of selecting a two-dimensional real clothes image by the user so that the user can photograph according to the photographing guide information; the third display unit is used for displaying a photo selection interface comprising at least one user picture to a user; the serving unit is used for receiving a user picture selected by a user and serving the user picture as a two-dimensional real person image.
The virtual fitting model training and virtual fitting apparatus described above may be implemented in the form of a computer program which may be run on a computer device as shown in fig. 6.
Referring to fig. 6, fig. 6 is a schematic block diagram of a computer device according to an embodiment of the present invention. The computer device 300 is a server or a terminal, and specifically, the server may be an independent server or a server cluster formed by a plurality of servers.
Referring to fig. 6, the computer device 300 includes a processor 302, a memory, which may include a storage medium 303 and an internal memory 304, and a network interface 305 connected by a system bus 301.
The storage medium 303 may store an operating system 3031 and a computer program 3032. The computer program 3032, when executed, causes the processor 302 to perform a method for virtual fitting model training based on images of real persons, a method for virtual fitting based on motion gesture models trained by the method for virtual fitting model training based on images of real persons, and a method for virtual fitting.
The processor 302 is used to provide computing and control capabilities to support the operation of the overall computer device 300.
The internal memory 304 provides an environment for running the computer program 3032 in the storage medium 303, and when the computer program 3032 is executed by the processor 302, the processor 302 may be enabled to execute a virtual fitting model training method based on the real character image, an action posture model trained by the virtual fitting model training method based on the real character image, or a virtual fitting method.
The network interface 305 is used for network communication with other devices. It will be appreciated by those skilled in the art that the configuration shown in fig. 6 is a block diagram of only a portion of the configuration associated with aspects of the present invention, and is not intended to limit the computing device 300 to which aspects of the present invention may be applied, and that a particular computing device 300 may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
Wherein the processor 302 is configured to run a computer program 3032 stored in the memory to implement the following steps: acquiring original training data, wherein the original training data comprises a two-dimensional original clothes image and a two-dimensional real figure image of corresponding clothes; carrying out posture estimation and marking on the two-dimensional real character image to obtain human body posture estimation and a human body dressing area; inputting the human body posture estimation, the human body dressing area and the two-dimensional original clothes image into a preset feature extraction network for feature extraction and fusion to obtain human body posture features, human body semantic features and clothes features; inputting the two-dimensional original clothes image, the human body posture characteristic, the human body semantic characteristic and the clothes characteristic into an appearance flow estimation network to obtain an appearance flow estimation graph; inputting the appearance flow estimation graph, the human body posture estimation and the human body dressing area to generate a confrontation neural network so as to obtain a virtual fitting image; and performing iterative training on the preset feature extraction network, the appearance flow estimation network and the generated antagonistic neural network according to the virtual fitting image and the two-dimensional real character image to generate a virtual fitting model.
In some embodiments, for example, in this embodiment, when the processor 302 performs the steps of performing pose estimation and labeling on the two-dimensional real person image to obtain the human body pose estimation and the human body dressing area, the following steps are specifically performed: carrying out posture estimation on the two-dimensional real character image through a posture detection model to obtain human body posture estimation; and marking the hair, the face and the lower body clothing region in the two-dimensional real person image to obtain a human body clothing region.
In some embodiments, for example, in this embodiment, when the processor 302 performs the steps of inputting the human body posture estimation, the human body dressing region, and the two-dimensional original clothes image into a preset feature extraction network for feature extraction and fusion to obtain a human body posture feature, a human body semantic feature, and a clothes feature, the following steps are specifically implemented: inputting the human body posture estimation, the human body dressing area and the two-dimensional original clothes image into a feature pyramid network for feature extraction to obtain a plurality of posture features, a plurality of human body features and a plurality of clothes features; and respectively fusing the plurality of posture features, the plurality of human body features and the plurality of clothes features to obtain human body posture features, human body semantic features and clothes features.
In some embodiments, for example, in this embodiment, when implementing the step of inputting the two-dimensional original clothes image, the human posture feature, the human semantic feature and the clothes feature into an appearance flow estimation network to obtain an appearance flow estimation map, the processor 302 specifically implements the following steps: inputting the human body posture features, the human body semantic features and the clothes features into an appearance flow estimation network to obtain an appearance flow; and deforming the two-dimensional original clothes image according to the appearance flow to obtain an appearance flow estimation diagram.
In some embodiments, for example, in this embodiment, when implementing the step of iteratively training the preset feature extraction network, the appearance flow estimation network, and the generation of the antagonistic neural network to generate the virtual fitting model according to the virtual fitting image and the two-dimensional real person image, the processor 302 specifically implements the following steps: calculating a loss value through a preset loss function according to the virtual fitting image and the two-dimensional real character image; if the loss value is not less than a preset loss value and the training times are less than preset training times, performing iterative training on the preset feature extraction network, the appearance flow estimation network and the generated countermeasure neural network; and if the loss value is smaller than the preset loss value or the training times are not smaller than the preset training times, taking the trained preset feature extraction network, the appearance flow estimation network and the generated antagonistic neural network as virtual fitting models.
Wherein the processor 302 is configured to run a computer program 3032 stored in the memory to implement the following steps: acquiring a two-dimensional real figure image and a two-dimensional real clothes image selected by a user; respectively carrying out attitude estimation and image semantic segmentation on the two-dimensional real character image to obtain human body attitude estimation and a human body dressing area; and inputting the human body posture estimation, the human body dressing area and the two-dimensional real clothes image into a virtual fitting model, and generating a virtual fitting image after feature extraction, appearance flow estimation and synthesis processing in sequence.
In some embodiments, for example, in this embodiment, before the step of obtaining the two-dimensional real person image and the two-dimensional real clothes image selected by the user, the processor 302 further performs the following steps: displaying a clothing selection interface comprising at least one clothing picture to a user; after a trigger instruction of selecting a two-dimensional real clothes image by a user is received, displaying photographing guide information to the user so that the user can photograph according to the photographing guide information; displaying a photo selection interface comprising at least one user picture to a user; and receiving a user picture selected by a user, and taking the user picture as a two-dimensional real person image.
It should be understood that, in the embodiment of the present invention, the Processor 302 may be a Central Processing Unit (CPU), and the Processor 302 may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable gate arrays (FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It will be understood by those skilled in the art that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program instructing associated hardware. The computer program may be stored in a storage medium that is computer-readable. The computer program is executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.
Accordingly, the present invention also provides a storage medium. The storage medium may be a computer-readable storage medium. The storage medium stores a computer program. The computer program, when executed by a processor, causes the processor to perform any of the embodiments of the virtual fitting model training method and the virtual fitting method based on the real character image described above.
The storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, which can store various computer readable storage media of program codes.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, various elements or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.
The steps in the method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs. The units in the device of the embodiment of the invention can be combined, divided and deleted according to actual needs. In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a terminal, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, while the invention has been described with respect to the above-described embodiments, it will be understood that the invention is not limited thereto but may be embodied with various modifications and changes.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. A virtual fitting model training method based on real character images is characterized by comprising the following steps:
acquiring original training data, wherein the original training data comprises a two-dimensional original clothes image and a two-dimensional real person image of a corresponding clothes;
carrying out posture estimation and marking on the two-dimensional real person image to obtain human body posture estimation and a human body dressing area;
inputting the human body posture estimation, the human body dressing area and the two-dimensional original clothes image into a preset feature extraction network for feature extraction and fusion to obtain human body posture features, human body semantic features and clothes features;
inputting the two-dimensional original clothes image, the human body posture characteristic, the human body semantic characteristic and the clothes characteristic into an appearance flow estimation network to obtain an appearance flow estimation graph;
inputting the appearance flow estimation graph, the human body posture estimation and the human body dressing area to generate an antagonistic neural network so as to obtain a virtual fitting image;
and performing iterative training on the preset feature extraction network, the appearance flow estimation network and the generated antagonistic neural network according to the virtual fitting image and the two-dimensional real character image to generate a virtual fitting model.
2. The method of claim 1, wherein the performing pose estimation and labeling on the two-dimensional real character image to obtain a body pose estimation and a body dressing area comprises:
carrying out posture estimation on the two-dimensional real character image through a posture detection model to obtain human body posture estimation;
and marking the hair, the face and the lower body clothes area in the two-dimensional real person image to obtain a human body clothes area.
3. The method as claimed in claim 1, wherein the predetermined feature extraction network is a feature pyramid network, and the human body posture estimation, the human body dressing area and the two-dimensional original clothes image are input into the predetermined feature extraction network for feature extraction and fusion to obtain human body posture features, human body semantic features and clothes features, and the method comprises:
inputting the human body posture estimation, the human body dressing area and the two-dimensional original clothes image into a feature pyramid network for feature extraction to obtain a plurality of posture features, a plurality of human body features and a plurality of clothes features;
and respectively fusing the plurality of posture features, the plurality of human body features and the plurality of clothes features to obtain human body posture features, human body semantic features and clothes features.
4. The method according to claim 1, wherein the inputting the two-dimensional original garment image, the human body posture features, the human body semantic features, and the garment features into an appearance flow estimation network to obtain an appearance flow estimation map comprises:
inputting the human body posture features, the human body semantic features and the clothes features into an appearance flow estimation network to obtain an appearance flow;
and deforming the two-dimensional original clothes image according to the appearance flow to obtain an appearance flow estimation diagram.
5. The method of claim 1, wherein the iteratively training the pre-set feature extraction network, the appearance flow estimation network, and the generated antagonistic neural network to generate the virtual fitting model according to the virtual fitting image and the two-dimensional real figure image comprises:
calculating a loss value through a preset loss function according to the virtual fitting image and the two-dimensional real character image;
if the loss value is not less than a preset loss value and the training times are less than preset training times, performing iterative training on the preset feature extraction network, the appearance flow estimation network and the generated countermeasure neural network;
and if the loss value is smaller than the preset loss value or the training times are not smaller than the preset training times, taking the trained preset feature extraction network, the appearance flow estimation network and the generated antagonistic neural network as virtual fitting models.
6. A virtual fitting method, comprising:
acquiring a two-dimensional real figure image and a two-dimensional real clothes image selected by a user;
respectively carrying out attitude estimation and image semantic segmentation on the two-dimensional real person image to obtain human body attitude estimation and a human body dressing area;
inputting the human body posture estimation, the human body dressing region and the two-dimensional real clothes image into the virtual fitting model according to any one of claims 1 to 5, and generating a virtual fitting image after feature extraction, appearance flow estimation and synthesis processing in sequence.
7. The method according to claim 6, wherein before the acquiring the two-dimensional real person image and the two-dimensional real clothes image selected by the user, the method further comprises:
showing a clothes selection interface comprising at least one clothes picture to a user;
after a trigger instruction of selecting a two-dimensional real clothes image by a user is received, displaying photographing guide information to the user so that the user can photograph according to the photographing guide information;
displaying a photo selection interface comprising at least one user picture to a user;
and receiving a user picture selected by a user, and taking the user picture as a two-dimensional real person image.
8. A virtual fitting model training device based on real character images is characterized by comprising:
a first acquisition unit configured to acquire original training data including a two-dimensional original clothes image and a two-dimensional real person image wearing a corresponding clothes;
the estimation marking unit is used for carrying out posture estimation and marking on the two-dimensional real person image to obtain human body posture estimation and a human body dressing area;
the feature extraction and fusion unit is used for inputting the human body posture estimation, the human body dressing area and the two-dimensional original clothes image into a preset feature extraction network for feature extraction and fusion to obtain human body posture features, human body semantic features and clothes features;
an appearance flow estimation unit, configured to input the two-dimensional original clothes image, the human body posture feature, the human body semantic feature, and the clothes feature into an appearance flow estimation network to obtain an appearance flow estimation map;
a first generating unit, configured to generate a confrontation neural network by inputting the appearance flow estimation diagram, the human body posture estimation, and the human body dressing area to obtain a virtual fitting image;
and the training unit is used for carrying out iterative training on the preset feature extraction network, the appearance flow estimation network and the generated antagonistic neural network according to the virtual fitting image and the two-dimensional real character image so as to generate a virtual fitting model.
9. A virtual fitting apparatus, comprising:
a second acquisition unit for acquiring a two-dimensional real figure image and a two-dimensional real clothes image selected by a user;
the estimation and segmentation unit is used for respectively carrying out attitude estimation and image semantic segmentation on the two-dimensional real character image so as to obtain human body attitude estimation and a human body dressing area;
a second generating unit, configured to input the human body posture estimation, the human body dressing region, and the two-dimensional real garment image into the virtual fitting model according to any one of claims 1 to 5, and generate a virtual fitting image after feature extraction, appearance flow estimation, and synthesis processing in sequence.
10. A computer device, characterized in that the computer device comprises a memory on which a computer program is stored and a processor, which when executing the computer program implements the method for training a virtual fitting model based on images of real characters according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210593210.0A CN114926324A (en) | 2022-05-27 | 2022-05-27 | Virtual fitting model training method based on real character image, virtual fitting method, device and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210593210.0A CN114926324A (en) | 2022-05-27 | 2022-05-27 | Virtual fitting model training method based on real character image, virtual fitting method, device and equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114926324A true CN114926324A (en) | 2022-08-19 |
Family
ID=82810375
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210593210.0A Pending CN114926324A (en) | 2022-05-27 | 2022-05-27 | Virtual fitting model training method based on real character image, virtual fitting method, device and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114926324A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116310659A (en) * | 2023-05-17 | 2023-06-23 | 中数元宇数字科技(上海)有限公司 | Training data set generation method and device |
-
2022
- 2022-05-27 CN CN202210593210.0A patent/CN114926324A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116310659A (en) * | 2023-05-17 | 2023-06-23 | 中数元宇数字科技(上海)有限公司 | Training data set generation method and device |
CN116310659B (en) * | 2023-05-17 | 2023-08-08 | 中数元宇数字科技(上海)有限公司 | Training data set generation method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11915352B2 (en) | Processing user selectable product images and facilitating visualization-assisted virtual dressing | |
EP3479296B1 (en) | System of virtual dressing utilizing image processing, machine learning, and computer vision | |
JP2020522285A (en) | System and method for whole body measurement extraction | |
CN111882408B (en) | Virtual trial method and device, electronic equipment and storage medium | |
CN111783506B (en) | Method, apparatus and computer readable storage medium for determining target characteristics | |
US20220284678A1 (en) | Method and apparatus for processing face information and electronic device and storage medium | |
CN111767817B (en) | Dress collocation method and device, electronic equipment and storage medium | |
CN111723707A (en) | Method and device for estimating fixation point based on visual saliency | |
CN114723888B (en) | Three-dimensional hair model generation method, device, equipment, storage medium and product | |
CN111754303A (en) | Method and apparatus for virtual changing of clothing, device and medium | |
CN111680573B (en) | Face recognition method, device, electronic equipment and storage medium | |
CN114937286A (en) | Virtual fitting method, device, equipment and medium | |
CN113706373A (en) | Model reconstruction method and related device, electronic equipment and storage medium | |
CN115272822A (en) | Method for training analytic model, virtual fitting method and related device | |
CN114926324A (en) | Virtual fitting model training method based on real character image, virtual fitting method, device and equipment | |
CN112258389B (en) | Virtual reloading method and related equipment | |
CN115222895B (en) | Image generation method, device, equipment and storage medium | |
CN116452291A (en) | Virtual fitting method, virtual fitting device, electronic equipment and storage medium | |
CN115994994A (en) | Virtual fitting and fitting model training method, device and equipment | |
CN115439309A (en) | Method for training clothes deformation model, virtual fitting method and related device | |
CN113177891A (en) | Image processing method, image processing device, electronic equipment and storage medium | |
CN111462337A (en) | Image processing method, device and computer readable storage medium | |
CN116977417B (en) | Pose estimation method and device, electronic equipment and storage medium | |
CN113569781B (en) | Human body posture acquisition method and device, electronic equipment and storage medium | |
CN115147508B (en) | Training of clothing generation model and method and device for generating clothing image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |