CN114170403A

CN114170403A - Virtual fitting method, device, server and storage medium

Info

Publication number: CN114170403A
Application number: CN202111404834.5A
Authority: CN
Inventors: 张阿强
Original assignee: Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Current assignee: Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Priority date: 2021-11-24
Filing date: 2021-11-24
Publication date: 2022-03-11

Abstract

The embodiment of the application relates to the technical field of image processing, and discloses a virtual fitting method, a device, a server and a storage medium.

Description

Virtual fitting method, device, server and storage medium

Technical Field

The embodiment of the application relates to the technical field of image processing, in particular to a virtual fitting method, a virtual fitting device, a server and a storage medium.

Background

The virtual fitting means that a three-dimensional image after fitting a new garment is automatically displayed by a virtual fitting device through a virtual technical means. The virtual fitting technology is a technical application for realizing the effect of looking up new clothes without taking off clothes on the body of a customer.

At present, a (Parser Free application Flow Network, PF-AFN) algorithm is a virtual fitting algorithm with a good effect, a distillation algorithm is applied to Appearance Flow, a fitting effect picture is generated through a 2D fitting algorithm based on a resolver and is used as a teacher Network. And then, the try-on effect picture is used as a sub-network to learn the appearance flow generation of the teacher network.

In the process of implementing the embodiment of the present application, the inventors find that the current technical solution has at least the following technical problems: the current PF-AFN algorithm occupies large resources for high-resolution input images and has insufficient fusion effect of clothes.

Disclosure of Invention

The technical problem mainly solved by the embodiments of the present application is to provide a virtual fitting method, device, server and storage medium, so as to improve the fusion effect of clothes.

In a first aspect, an embodiment of the present application provides a virtual fitting method, including:

acquiring a first target human body image and a first target clothes image, wherein the first target human body image and the first target clothes image are both high-resolution images;

zooming the first target human body image and the first target clothes image to generate a second target human body image and a second target clothes image;

generating a first target fusion image according to the second target human body image and the second target clothes image;

generating a first target processing image according to the first target human body image, wherein the first target processing image does not comprise clothes information and arm information;

and inputting the first target fusion image, the first target clothes image and the first target processing image into the first clothes model to generate a target dressing image.

In some embodiments, the method further comprises: pre-training a first garment model, specifically comprising:

acquiring a first data set, wherein the first data set comprises a first human body image and a first clothes image, and the first human body image and the first clothes image are both high-resolution images;

scaling a first human body image and a first clothing image in a first data set to generate a second data set, wherein the second data set comprises the scaled first human body image and the scaled first clothing image;

generating a third data set according to the scaled first human body image and the scaled first clothes image in the second data set, wherein the third data set comprises a first fusion image;

generating a first processed image according to the first human body image, wherein the first processed image does not include clothing information and arm information;

zooming the first fused image to generate a second fused image, wherein the second fused image has the same resolution as the first human body image;

inputting a first human body image, a first processing image and a second fusion image, and training a first clothes model.

In some embodiments, generating the first target fusion image from the second target human body image and the second target clothes image includes:

and inputting the second target human body image and the second target clothes image into an appearance flow deformation module and a generation module to generate a first target fusion image.

In some embodiments, inputting the second target human body image and the second target clothing image to the appearance flow warping module and the generating module, generating the first target fusion image, comprises:

inputting the second target human body image and the second target clothes image into an appearance flow deformation module to generate second deformation information;

deforming the second target clothes image through the second deformation information to generate a third target clothes image;

and inputting the second target human body image and the third target clothes image into a generating module to generate a first target fusion image.

In some embodiments, the apparent flow deformation module comprises: a first appearance flow warping module and a second appearance flow warping module, wherein the second appearance flow warping module is used for generating second warping information, and the method further comprises:

training a second appearance flow deformation module specifically comprises:

acquiring a first human body image and a first clothes image;

acquiring human body image analysis information according to the first human body image;

processing the human body image analysis information and the first clothes image through a first appearance flow deformation module to generate first deformation information;

acquiring a second clothes image, and deforming the second clothes image through the first deformation information to generate a third clothes image;

fusing the third clothes image and the human body image analysis information to generate a first fitting image;

and training a second appearance flow deformation module according to the first clothes image and the first fitting image.

In some embodiments, the method further comprises:

and in the training process of the second appearance flow deformation module, knowledge distillation is carried out on the second appearance flow deformation module through the first appearance flow deformation module.

In some embodiments, the first appearance flow deformation module comprises a resolver appearance flow deformation module and the second appearance flow deformation module comprises a resolver-free appearance flow deformation module.

In a second aspect, an embodiment of the present application provides a virtual fitting apparatus, including:

the system comprises an acquisition unit, a display unit and a control unit, wherein the acquisition unit is used for acquiring a first target human body image and a first target clothes image, and the first target human body image and the first target clothes image are both high-resolution images;

the scaling unit is used for scaling the first target human body image and the first target clothes image to generate a second target human body image and a second target clothes image;

the generating unit is used for generating a first target fusion image according to the second target human body image and the second target clothes image; generating a first target processing image according to the first target human body image, wherein the first target processing image does not comprise clothes information and arm information;

and the fusion unit is used for inputting the first target fusion image, the first target clothes image and the first target processing image into the first clothes model to generate a target dressing image.

In a third aspect, an embodiment of the present application provides a server, including:

memory and one or more processors for executing one or more computer programs stored in the memory, the one or more processors when executing the one or more computer programs, cause the electronic device to implement the method as in the first aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program, the computer program comprising program instructions that, when executed by a processor, cause the processor to perform the method according to the first aspect.

The beneficial effects of the embodiment of the application are as follows: different from the situation in the prior art, an embodiment of the present application provides a virtual fitting method, including: acquiring a first target human body image and a first target clothes image, wherein the first target human body image and the first target clothes image are both high-resolution images; zooming the first target human body image and the first target clothes image to generate a second target human body image and a second target clothes image; generating a first target fusion image according to the second target human body image and the second target clothes image; generating a first target processing image according to the first target human body image, wherein the first target processing image does not comprise clothes information and arm information; and inputting the first target fusion image, the first target clothes image and the first target processing image into the first clothes model to generate a target dressing image.

The method comprises the steps of obtaining a first target human body image and a second target clothes image with high resolution, zooming to obtain the second target human body image and the second target clothes image to generate a first target fusion image, generating a first target processing image which does not comprise clothes information and arm information according to the first target human body image, inputting the first target fusion image, the first target clothes image and the first target processing image into a first clothes model, and generating a target dressing image.

Drawings

One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the figures in which like reference numerals refer to similar elements and which are not to scale unless otherwise specified.

Fig. 1 is a schematic application environment diagram of a virtual fitting method provided in an embodiment of the present application;

fig. 2 is a schematic flowchart of a virtual fitting method according to an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating a virtual fitting method according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of an appearance flow distortion module according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of a virtual fitting apparatus according to an embodiment of the present application;

fig. 6 is a schematic flowchart of a virtual fitting method according to a second embodiment of the present application;

FIG. 7 is a schematic diagram illustrating a virtual fitting method according to a second embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of a virtual fitting apparatus according to a second embodiment of the present application;

fig. 9 is a schematic flowchart of a virtual fitting method according to a third embodiment of the present application;

FIG. 10 is a schematic diagram of a third embodiment of the present application for generating a first target fusion image;

fig. 11 is a schematic diagram of a training second appearance flow transformation module according to a third embodiment of the present application;

FIG. 12 is a schematic diagram of a training first garment model provided in the third embodiment of the present application;

fig. 13 is a schematic structural diagram of a virtual fitting apparatus according to a third embodiment of the present application;

fig. 14 is a schematic hardware structure diagram of a server according to an embodiment of the present application.

Detailed Description

The present application will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the present application, but are not intended to limit the present application in any way. It should be noted that various changes and modifications can be made by one skilled in the art without departing from the spirit of the application. All falling within the scope of protection of the present application.

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

It should be noted that, if not conflicted, the various features of the embodiments of the present application may be combined with each other within the scope of protection of the present application. Additionally, while functional block divisions are performed in apparatus schematics, with logical sequences shown in flowcharts, in some cases, steps shown or described may be performed in sequences other than block divisions in apparatus or flowcharts. Further, the terms "first," "second," "third," and the like, as used herein, do not limit the data and the execution order, but merely distinguish the same items or similar items having substantially the same functions and actions.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

In addition, the technical features mentioned in the embodiments of the present application described below may be combined with each other as long as they do not conflict with each other.

Before the present application is explained in detail, terms and expressions referred to in the embodiments of the present application are explained, and the terms and expressions referred to in the embodiments of the present application are applicable to the following explanations:

(1) the generation of a countermeasure network (GAN) is to make the samples generated by the generation network obey the real data distribution by means of countermeasure training. One is to judge the network, the goal is to judge as accurately as possible whether a sample comes from real data or is generated by the generating network; when the network is converged finally, if the judging network can not judge the source of a sample any more, the method is equivalent to the generation of the network which can generate the sample according with the real data distribution.

(2) Appearance flow, refers to which pixels in the source can be used to synthesize a two-dimensional coordinate vector of the target.

(3) Knowledge Distillation, i.e. (Knowledge Distillation, KD), refers to a model compression method, which is a training method based on the "teacher-student network thought" by extracting Knowledge (Knowledge), Distillation (Distillation) contained in a trained model into another model. The method is characterized in that a soft-target related to a Teacher Network (complex but excellent inference performance) is introduced as a part of the total loss to induce the training of Student networks (simple and low-complexity), so that knowledge transfer (namely, another more complex Teacher Network (typically, integration of a plurality of networks) is trained first, and the output of a large Network is used as a soft target to train the Student networks (Student networks).

The technical scheme of the application is specifically explained in the following by combining the drawings in the specification.

Referring to fig. 1, fig. 1 is a schematic view of an application environment of a virtual fitting method according to an embodiment of the present disclosure;

as shown in fig. 1, the application environment 100 includes: a terminal 101 and a server 102, the terminal 101 and the server 102 communicating by wired or wireless communication.

The terminal 101 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, or the like, but is not limited thereto. The terminal 101 may be provided with a client, which may be a video client, a browser client, an online shopping client, an instant messaging client, or the like, and the type of the client is not limited in the present application.

The terminal 101 and the server 102 may be directly or indirectly connected through wired or wireless communication, and the present application is not limited thereto. The terminal 101 may receive the clothes images sent by the server 102 and display the clothes images on a visual interface, and the terminal 101 may further set a corresponding try-on button at each clothes image to provide a try-on function. The user can browse the clothes images, the fitting instruction of the clothes images is triggered by triggering the fitting button corresponding to any one of the clothes images, the terminal can respond to the fitting instruction and acquire the human body images through the image acquisition device, the image acquisition device can be arranged in the terminal 101 and can be externally connected with the terminal 101, and the application is not limited to this.

The terminal 101 may send the fitting instruction and the collected human body image to the server 102, receive the target dressing image returned by the server 102, and further display the target dressing image on a visual interface, so that the user can know the upper body effect of the clothes.

It is understood that the terminal 101 may be generally referred to as one of a plurality of terminals, and the embodiment of the present application is illustrated by the terminal 101. Those skilled in the art will appreciate that the number of terminals described above may be greater or fewer. For example, the number of the terminals may be only one, or the number of the terminals may be several tens or several hundreds, or more, and the number of the terminals and the type of the device are not limited in the embodiment of the present application.

The server 102 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like.

The server 102 and the terminal 101 may be directly or indirectly connected through wired or wireless communication, and the present application is not limited thereto. Server 102 may maintain a clothing image database for storing a plurality of clothing images. The server 102 may receive the fitting instruction and the human body image sent by the terminal 101, acquire a clothes image corresponding to the fitting instruction from a clothes image database according to the fitting instruction, generate a target dressing image based on the clothes image and the human body image, and send the target dressing image to the terminal 101.

It is understood that the number of the servers 102 may be more or less, and the embodiment of the present application is not limited thereto. Of course, the server 102 may also include other functional servers to provide more comprehensive and diverse services.

Example one

Referring to fig. 2, fig. 2 is a schematic flow chart of a virtual fitting method according to an embodiment of the present application;

the virtual fitting method is applied to a server, and particularly, an execution main body of the virtual fitting method is one or more processors of the server.

As shown in fig. 2, the virtual fitting method includes:

step S201: acquiring a first human body image and a first clothes image;

specifically, the first human body image is an image of a person wearing clothes, that is, the first human body image includes person information and clothes information. It is understood that the first human body image further includes a background portion, and the person in the first human body image may have various postures or postures, such as a waist, natural sagging of both hands, and the like, which is not limited in this embodiment of the application.

Referring to fig. 3, fig. 3 is a schematic diagram illustrating a virtual fitting method according to an embodiment of the present disclosure;

as shown in FIG. 3, a first human body image (I) and a first clothes image (I) are acquired_c)。

Step S202: acquiring human body image analysis information according to the first human body image;

as shown in fig. 3, after the first human body image (I) is acquired, processing is performed to obtain human body image analysis information (p)^*) Specifically, the human body image analysis information includes: the human body mask image is obtained by segmenting a human body in the first human body image and removing a coat part; the Human body dense point segmentation graph is Human body dense attitude key points (Densepost), and is obtained by processing a Human body semantic analyzer (Human Parsing); the human body key point map is obtained by obtaining human body key points by a human body posture estimator of a body tracking system (openpos), and optionally, other manners may also be used to obtain the human body key points, which is not limited in the embodiment of the present application.

Step S203: according to the human body image analysis information and the first clothes image, processing is carried out through an appearance flow deformation module, and first deformation information is generated;

specifically, human body image analysis information and a first clothes image are input into an appearance flow deformation module, the appearance flow deformation module processes the human body image analysis information and the first clothes image to obtain first deformation information, and the appearance flow deformation module comprises: a parser-based appearance flow warping module (PB-AFWM) for parsing information (p) from an input human body image, as shown in fig. 3^*) And a first garment image (I)_c) Processed to obtain a first patternAnd (uf) variable information. Wherein the parser-based appearance flow morphing module (PB-AFWM) comprises an appearance flow morphing module (AFWM) for predicting a dense correspondence between a clothing image and a character image in order to morph the clothing, wherein an output of the appearance flow morphing module is an appearance flow, i.e. first morph information (uf), which is a set of two-dimensional coordinate vectors, each vector indicating which pixels in the clothing image should be used to fill in a specific pixel in the character image.

Referring to fig. 4 again, fig. 4 is a schematic diagram of an appearance flow transformation module according to an embodiment of the present disclosure;

as shown in fig. 4, the Appearance Flow Warping Module (AFWM) is composed of two Pyramid Feature Extraction Networks (PFEN) and an Appearance Flow Estimation Network (AFEN). PFEN extracts two pyramid depth features from two inputs. Then, at each pyramid level, AFEN learning generates a coarse appearance stream and performs optimization at the next pyramid level. A second order smoothing constraint is also employed in learning the appearance flow to further preserve features of the garment such as logos and stripes.

The Pyramid Feature Extraction Network (PFEN) includes a feature extraction network and a feature fusion structure, for example: a backbone + FPN, wherein the backbone is a commonly used feature extraction network, comprising: vgg network, resnet network, etc., FPN is a feature fusion structure through which features can be better extracted.

The Appearance Flow Estimation Network (AFEN) is composed of N appearance flow generation networks (FN), for example: FN-1, FN-2, FN-3, and estimating the appearance flow from the pyramid features of N levels, such as: firstly, the extracted pyramid characteristics (C)_N，P_N) Inputting an appearance flow generation network FN-1 at the Nth pyramid layer, and estimating an initial appearance flow f₁Then, f is mixed₁And a pyramid-feature input appearance flow generation network FN-2 at the N-1 st layer to obtain a better appearance flow f₂And so on until the best apparent flow f is obtained_NAnd according to the appearance flow f_NAnd deforming the first clothes image to obtain first deformation information.

Specifically, each appearance flow generation network (FN) performs pixel-by-pixel feature matching to produce a coarse traffic estimate, and performs optimization at each pyramid level, taking FN-2 as an example, with the input being two pyramid features (c)₂，p₂) And the appearance flow f of the last pyramid layer₁Wherein, the FN operation can be roughly divided into four stages:

in the first stage, the initial appearance flow f₁Up-sampling is carried out to obtain an appearance stream f'₁Then to c₂Sampling the vector of (a), c₂Deformation is c'₂Wherein the sampling position is f'₁Specifying;

in the second stage, according to c'₂And p₂Computing an association map r₂Wherein r is₂The j point of (a) is a vector representing c'₂J in (1) and with p₂Point j in (d) is the result of the vector matrix product of the local displacement region centered on point j. In this case, r₂Is equal to the number of points of the local displacement area;

in the third stage, once r is obtained₂Input into the Conv network to predict f' ″₂And adding it to f'₁In the form of a coarse appearance stream f ″₂；

In the fourth stage, according to the newly generated appearance flow f ″, the display device₂C is mixing₂Deformation of c₂Then, c ″', is added₂And p₂Connected to the Conv network to calculate a residual look stream f'₂By flowing the appearance stream f'₂Added to the appearance flow f ″)₂To obtain a final appearance flow f in the appearance flow generating network FN-2₂。

In the embodiment of the application, the Appearance Flow Estimation Network (AFEN) gradually modifies the estimated appearance flow through the cascaded N appearance flow generation networks (FN) to capture the long-distance corresponding relation between the clothes image and the person image, so that the dislocation and deformation can be favorably processed, and the clothes deformation effect can be enhanced.

Step S204: acquiring a second clothes image, and deforming the second clothes image through the first deformation information to generate a third clothes image;

specifically, the second clothes image is a target image to be virtually dressed for the user, and the first deformation information (u) is used_f) For the second clothes image

Deforming (warp) to obtain a third garment image

And the third clothes image is an image obtained after the second clothes image is deformed.

Step S205: fusing the third clothes image and the human body image analysis information to generate a first fitting image;

as shown in fig. 3, the third clothes image is fused

And human body image analysis information (p)^*) Generating a first fitting image

Specifically, the third clothes image and the human body image analysis information are fused to generate a first fitting image, and the method comprises the following steps:

and inputting the third clothes image and the human body image analysis information into a first generation module so as to fuse the third clothes image and the human body image analysis information and generate a first try-on image.

Wherein the first generation module comprises a parser generation module (PB-GM), and as shown in fig. 3, the input of the parser generation module (PB-GM) is the third clothes image

And human body image analysis information, generating a model by having a parserThe block (PB-GM) fuses the third clothes image and the human body image analysis information to obtain a first try-on image

Step S206: deforming the first clothes image through the first deformation information to generate a fourth clothes image;

specifically, as shown in fig. 3, the first deformation information (u) is passed_f) For the first clothes image (I)_c) And performing deformation (warp) to generate a fourth clothes image (Sw), wherein the fourth clothes image (Sw) is an image obtained after the first clothes image is deformed.

Step S207: and fusing the fourth clothes image and the first fitting image to generate a target dressing image.

Specifically, the fusing the fourth clothing image and the first fitting image to generate the target clothing image includes:

and inputting the fourth clothes image and the first fitting image into a second generation module to fuse the fourth clothes image and the first fitting image to generate a target dressing image.

Wherein the second generating module comprises a parameter-free generating module (PF-GM), and as shown in fig. 3, the input of the parameter-free generating module (PF-GM) is the first try-on image

And a fourth clothing image (S)_W) Wherein the fourth clothing image (Sw) is composed of the first deformation information (u)_f) For the first clothes image (I)_c) Obtained by deformation (warp).

In an embodiment of the present application, the method further includes:

knowledge distillation is carried out on the second generation module through the first generation module, so that the second generation module learns the feature extraction mode and the feature fusion mode of the first generation module.

Specifically, the first generation module includes a resolver generation module (PB-GM), the second generation module includes a non-resolver generation module (PF-GM), and as shown in fig. 3, the performing distillation learning on the non-resolver generation module (PF-GM) by the resolver generation module (PB-GM) includes: the analyzer-free generation module (PF-GM) is a student network, and the loss and the intermediate characteristics of the student network (PF-GM) are restricted through the loss or the intermediate characteristics of the teacher network (PB-GM), so that the characteristic extraction mode and the characteristic fusion mode of the teacher network (PB-GM) are learned. It is understood that the first generating module and the second generating module are both generating networks, for example: deep learning image segmentation networks, such as: and the UNet network is used for generating the target image by passing the images of different inputs through a loss function or a constraint condition.

Specifically, the first generation module includes a first appearance flow deformation module, and the second generation module includes a second appearance flow deformation module, and knowledge distillation is carried out to the second generation module through the first generation module, including:

and distance calculation is carried out on the feature layer in the second appearance flow deformation module through the feature layer in the first appearance flow deformation module, the calculated distance is used as a loss function, and the second generation module is trained, so that the feature layer in the second appearance flow deformation module is close to the feature layer in the first appearance flow deformation module. The first appearance flow deformation module and the second appearance flow deformation module are both appearance flow deformation modules (AFWM), and the first appearance flow deformation module and the second appearance flow deformation module have the same structure.

For example: the same apparent flow warping module is present in both PF-GM and PB-GM through feature layers (p1, p2, p3) in the PFN module in the Apparent Flow Warping Module (AFWM). Then, by calculating the corresponding Euclidean distances of (p1, p2, p3) in PF-GM and PB-GM respectively and taking the distances as a loss function, a PF-GM model is trained so that the characteristic layers (p1, p2, p3) in PF-GM are gradually close to the characteristic layers (p1, p2, p3) in PB-GM, thereby completing the distillation learning process.

Knowledge distillation is carried out on the second generation module through the first generation module, so that the second generation module learns the feature extraction mode and the feature fusion mode of the first generation module, the network expression capacity of a generation module (PF-GM) without a resolver can be improved, and the effect of clothes fusion is favorably improved.

In an embodiment of the present application, the method further includes:

acquiring a generation countermeasure network, wherein the generation countermeasure network comprises a generation network and a judgment network; acquiring a first loss function corresponding to a generated network and a second loss function of a judgment network; the second generation module is trained based on the first loss function and the second loss function.

Specifically, the generation countermeasure network (GAN) estimates a new framework of the generation model through a countermeasure process. The generation of the confrontation network refers to a deep generation model which adopts a confrontation training mode to learn, and the discrimination network and the generation network which are contained in the generation network can use different network structures according to different generation tasks. In the GAN network, the generation network and the discrimination network perform non-cooperative zero-sum game. The generation network of the GAN captures the potential distribution of real data samples and generates new data samples; the GAN is a two-classifier, which judges whether the input is real data or generated samples; the generation network and the discrimination network of the GAN can both use a perception machine or a deep learning model; the optimization process of the GAN network is a minimum max game (Minimax game) problem, and the optimization goal is to reach nash equilibrium, that is, until a discriminant model (Discriminator) cannot identify whether a false sample generated by a generation model (Generator) is true or false. When the data is finally converged, if the judging network can not judge the source of a sample any more, the method is equivalent to that the generating network can generate the sample which accords with the real data distribution.

Specifically, a first loss function corresponding to a generated network and a second loss function of a discriminant network are obtained; the second generation module is confronted training based on the first loss function and the second loss function.

In the embodiment of the application, when the second generation module is trained, the mode of confrontation training is adopted, so that the generation capacity of the second generation module is favorably improved, the second generation module has better retention capacity and fusion effect, and the generation of vivid target dressing images is favorably realized.

In an embodiment of the present application, there is provided a virtual fitting method, including: acquiring a first human body image and a first clothes image; acquiring human body image analysis information according to the first human body image; according to the human body image analysis information and the first clothes image, processing is carried out through an appearance flow deformation module, and first deformation information is generated; acquiring a second clothes image, and deforming the second clothes image through the first deformation information to generate a third clothes image; fusing the third clothes image and the human body image analysis information to generate a first fitting image; deforming the first clothes image through the first deformation information to generate a fourth clothes image; and fusing the fourth clothes image and the first fitting image to generate a target dressing image.

On one hand, the human body image analysis information is obtained by utilizing the first human body image, the first deformation information is generated by processing through the appearance flow deformation module based on the human body image analysis information and the first clothes image, the appearance flow can be utilized for distillation learning, and the clothes characteristics and details are reserved;

on the other hand, the second clothes image is deformed through the first deformation information by acquiring the second clothes image, and a third clothes image is generated; fusing the third clothes image and the human body image analysis information to generate a first fitting image; deforming the first clothes image through the first deformation information to generate a fourth clothes image; the fourth clothes image and the first fitting image are fused to generate a target clothes-wearing image, the first clothes image can be deformed by utilizing the first deformation information, the first fitting image is obtained by fusing the human body image analysis information, the fourth clothes image is fused, and the fusion effect of the clothes can be improved.

Referring to fig. 5 again, fig. 5 is a schematic structural diagram of a virtual fitting device according to an embodiment of the present application;

the virtual fitting device is applied to a server, and particularly, the virtual fitting device is applied to one or more processors of the server.

As shown in fig. 1, the virtual fitting apparatus 50 includes:

an acquiring unit 501 for acquiring a first human body image and a first garment image; acquiring human body image analysis information according to the first human body image;

the deformation unit 502 is used for analyzing the information and the first clothes image according to the human body image and processing the information through the appearance flow deformation module to generate first deformation information; the first deformation information is used for deforming the first clothes image to generate a fourth clothes image;

the generating unit 503 is configured to acquire a second garment image, deform the second garment image according to the first deformation information, and generate a third garment image;

a fusion unit 504, configured to fuse the third clothes image and the human body image analysis information to generate a first fitting image; and fusing the fourth clothes image and the first fitting image to generate a target dressing image.

In the embodiment of the present application, the virtual fitting apparatus may also be built by hardware devices, for example, the virtual fitting apparatus may be built by one or more than two chips, and the chips may work in coordination with each other to complete the virtual fitting method described in the above embodiments. For another example, the virtual fitting apparatus may also be constructed by various logic devices, such as a general processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a single chip, an arm (acorn RISC machine) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination of these components.

The virtual fitting device in the embodiment of the present application may be a device, or may be a component, an integrated circuit, or a chip in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine or a self-service machine, and the like, and the embodiments of the present application are not particularly limited.

The virtual fitting device in the embodiment of the present application may be a device having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present application are not limited specifically.

The virtual fitting device provided by the embodiment of the application can realize each process realized by fig. 2, and is not repeated here for avoiding repetition.

It should be noted that, the virtual fitting apparatus can execute the virtual fitting method provided in the embodiments of the present application, and has functional modules and beneficial effects corresponding to the execution method. For technical details that are not described in detail in the embodiment of the boot management device, reference may be made to the virtual fitting method provided in the embodiment of the present application.

In an embodiment of the present application, there is provided a virtual fitting apparatus including: an acquisition unit configured to acquire a first human body image and a first garment image; acquiring human body image analysis information according to the first human body image; the deformation unit is used for analyzing the information and the first clothes image according to the human body image and processing the information through the appearance flow deformation module to generate first deformation information; the first deformation information is used for deforming the first clothes image to generate a fourth clothes image; the generating unit is used for acquiring a second clothes image, deforming the second clothes image through the first deformation information and generating a third clothes image; the fusion unit is used for fusing the third clothes image and the human body image analysis information to generate a first fitting image; and fusing the fourth clothes image and the first fitting image to generate a target dressing image. On one hand, the human body image analysis information is obtained by utilizing the first human body image, the first deformation information is generated by processing through the appearance flow deformation module based on the human body image analysis information and the first clothes image, the appearance flow can be utilized for distillation learning, and the clothes characteristics and details are reserved; on the other hand, the second clothes image is deformed through the first deformation information by acquiring the second clothes image, and a third clothes image is generated; fusing the third clothes image and the human body image analysis information to generate a first fitting image; deforming the first clothes image through the first deformation information to generate a fourth clothes image; the fourth clothes image and the first fitting image are fused to generate a target clothes-wearing image, the first clothes image can be deformed by utilizing the first deformation information, the first fitting image is obtained by fusing the human body image analysis information, the fourth clothes image is fused, and the fusion effect of the clothes can be improved.

Example two

Referring to fig. 6, fig. 6 is a schematic flow chart of a virtual fitting method according to a second embodiment of the present application;

As shown in fig. 6, the virtual fitting method includes:

step S601: acquiring a first human body image and a first clothes image;

Step S602: acquiring human body image analysis information according to the first human body image;

referring to fig. 3 again, as shown in fig. 3, after the first human body image (I) is acquired, the first human body image is processed to obtain human body image analysis information (p)^*) Specifically, the human body image analysis information includes: the human body mask map, the human body dense point segmentation map and the human body key point map are obtained by segmenting a human body in a first human body image and removing a coat part; the human body dense point segmentationThe graph is a dense Human posture key point (Densepost), and the Human body dense point segmentation graph is obtained by processing a Human body semantic analyzer (Human Parsing); the human body key point map is obtained by obtaining human body key points by a human body posture estimator of a body tracking system (openpos), and optionally, other manners may also be used to obtain the human body key points, which is not limited in the embodiment of the present application.

Step S603: according to the human body image analysis information and the first clothes image, processing is carried out through an appearance flow deformation module, and first deformation information is generated;

specifically, human body image analysis information and a first clothes image are input into an appearance flow deformation module, the appearance flow deformation module processes the human body image analysis information and the first clothes image to obtain first deformation information, and the appearance flow deformation module comprises: there is a parser appearance flow warping module (PB-AFWM) for parsing information (p) based on an input human body image, as shown in FIG. 3^*) And a first garment image (I)_c) Processing the first deformation information (u) to obtain first deformation information_f). Wherein the parser-based appearance flow morphing module (PB-AFWM) comprises an appearance flow morphing module (AFWM) for predicting a dense correspondence between a clothing image and a character image in order to morph the clothing, wherein an output of the appearance flow morphing module is an appearance flow, i.e. first morph information (uf), which is a set of two-dimensional coordinate vectors, each vector indicating which pixels in the clothing image should be used to fill in a specific pixel in the character image.

Specifically, regarding the relevant processing procedure of the appearance flow morphing module (AFWM), reference may be made to the relevant description of the first embodiment, which is not repeated herein.

Step S604: acquiring a second clothes image, and deforming the second clothes image through the first deformation information to generate a first deformed clothes image;

specifically, please refer to fig. 7, fig. 7 is a schematic diagram illustrating a virtual fitting method according to a second embodiment of the present application;

as shown in fig. 7, the second clothes image is a target image to be virtually dressed on the user by the first deformation information (u)_f) For the second clothes image

Deforming (warp) to obtain a first deformed garment image

The first deformed clothes image is an image obtained after the second clothes image is deformed.

Step S605: acquiring a first hand image according to the first human body image, wherein the first hand image comprises hand information and arm information;

as shown in fig. 7, a first hand image is obtained by processing a first human body image (I), wherein the first hand image includes hand information and arm information, and specifically, the first human body image is processed, the first hand image is obtained by identifying the hand information and the arm information by a neural network, the hand information and the arm information are extracted from the first hand image, and the first hand image is segmented to obtain the first hand image, wherein the neural network includes: the convolutional neural network comprises a convolutional layer, a convergence layer and a full-connection layer, and the convolutional neural network is formed by alternately stacking the convolutional layer, the convergence layer and the full-connection layer. The convolutional layer functions to extract the features of a local region, and different convolutional kernels correspond to different feature extractors. The convergence Layer (Pooling Layer), also called sub-sampling Layer (Subsampling Layer), functions to perform feature selection, reduce the number of features, and thus reduce the number of parameters. Preferably, the convolutional neural network in the embodiment of the present application generates a countermeasure network for deep convolution.

Step S606: fusing the first deformed clothes image, the first hand image and the first human body image to determine a first fused clothes image;

specifically, fusing a first deformed garment image, a first hand image and a first human body image, and determining a first fused garment image, includes:

and fusing the first deformed clothes image, the first hand image and the human body dense point segmentation image through the first generation module to determine a first fused clothes image.

Wherein, the first generating module comprises: there is a parser generation module (PB-GM).

As shown in FIG. 7, the input of the parser-based generating module (PB-GM) is the first modified garment image

And outputting the first hand image and the human body dense point segmentation image as a first fused clothes image.

In the embodiment of the application, the clothes information, the hand information and the arm information are fused specially, so that the original characteristics of the clothes information, the hand information and the arm information are fully reserved, and the fusion effect is improved.

Step S607: acquiring a second human body image according to the first human body image, wherein the second human body image is obtained by removing the clothes information and the arm information from the first human body image;

specifically, clothes information and arm information in the first human body image are identified through a neural network, and the clothes information and the arm information in the first human body image are removed to obtain a second human body image. Wherein the neural network comprises a convolutional neural network.

In an embodiment of the present application, the method further includes:

acquiring a generation countermeasure network, wherein the generation countermeasure network comprises a generation network and a judgment network;

and performing countermeasure training on the first generation module based on the generated countermeasure network.

The generation countermeasure network (GAN), i.e., the GAN network estimates a new framework of a generation model through a countermeasure process. The generation of the confrontation network refers to a deep generation model which adopts a confrontation training mode to learn, and the discrimination network and the generation network which are contained in the generation network can use different network structures according to different generation tasks. In the GAN network, the generation network and the discrimination network perform non-cooperative zero-sum game. The generation network of the GAN captures the potential distribution of real data samples and generates new data samples; the GAN is a two-classifier, which judges whether the input is real data or generated samples; the generation network and the discrimination network of the GAN can both use a perception machine or a deep learning model; the optimization process of the GAN network is a minimum max game (Minimax game) problem, and the optimization goal is to reach nash equilibrium, that is, until a discriminant model (Discriminator) cannot identify whether a false sample generated by a generation model (Generator) is true or false. When the data is finally converged, if the judging network can not judge the source of a sample any more, the method is equivalent to that the generating network can generate the sample which accords with the real data distribution.

Specifically, a first loss function corresponding to a generated network and a second loss function of a discriminant network are obtained; the first generation module is confronted training based on the first loss function and the second loss function.

In the embodiment of the application, when the first generation module is trained, the mode of confrontation training is adopted, so that the generation capacity of the first generation module is favorably improved, the first generation module has better retention capacity and fusion effect, and the generation of vivid target dressing images is favorably realized.

Step S608: and fusing the first deformed clothes image, the first fused clothes image, the first human body image and the second human body image to obtain a target dressing image.

Specifically, fuse first deformation clothes image, first fusion clothes image, first human body image and second human body image, obtain the target image of dressing, include:

and fusing the first deformed clothes image, the first fused clothes image, the human body dense point segmentation image and the second human body image through the first generation module to obtain a target dressing image. Wherein the first generation module comprises a parser generation module (PB-GM).

As shown in fig. 7, the input of the parser-based generation module (PB-GM) is a first modified clothing image, a first fused clothing image, a human body dense point segmentation map, and a second human body image, and the output is a target dressing image.

In the embodiment of the application, the first fused clothes image comprising the hand information and the arm information is utilized, the second human body image with the clothes information and the arm information removed is combined, and the human body dense point segmentation graph is combined, so that the hand information and the arm information can be better kept, the fusion effect is improved, the fusion effect of the arm can be improved under the condition that the long-sleeve short sleeves are changed, and the target dressing image with the vivid fusion effect is generated.

In an embodiment of the present application, there is provided a virtual fitting method, including: acquiring a first human body image and a first clothes image; acquiring human body image analysis information according to the first human body image; according to the human body image analysis information and the first clothes image, processing is carried out through an appearance flow deformation module, and first deformation information is generated; acquiring a second clothes image, and deforming the second clothes image through the first deformation information to generate a first deformed clothes image; acquiring a first hand image according to the first human body image, wherein the first hand image comprises hand information and arm information; fusing the first deformed clothes image, the first hand image and the first human body image to determine a first fused clothes image; acquiring a second human body image according to the first human body image, wherein the second human body image is obtained by removing the clothes information and the arm information from the first human body image; and fusing the first deformed clothes image, the first fused clothes image, the first human body image and the second human body image to obtain a target dressing image.

The first hand image and the second human body image are obtained by utilizing the first human body image, the first deformed clothes image, the first hand image and the first human body image are utilized to determine a first fused clothes image, and the first deformed clothes image, the first deformed clothes image and the second human body image are fused to obtain a target dressing image.

Referring to fig. 8 again, fig. 8 is a schematic structural diagram of a virtual fitting device according to a second embodiment of the present application;

As shown in fig. 8, the virtual fitting apparatus 80 includes:

an acquiring unit 801 configured to acquire a first human body image and a first clothing image; acquiring human body image analysis information according to the first human body image;

the deformation unit 802 is configured to analyze the information and the first garment image according to the human body image, process the information and the first garment image through an appearance flow deformation module, and generate first deformation information; acquiring a second clothes image, and deforming the second clothes image through the first deformation information to generate a first deformed clothes image;

a fusion unit 803, configured to obtain a first hand image according to the first human body image, where the first hand image includes hand information and arm information; fusing the first deformed clothes image, the first hand image and the first human body image to determine a first fused clothes image; acquiring a second human body image according to the first human body image, wherein the second human body image is obtained by removing clothes information and arm information from the first human body image; and fusing the first deformed clothes image, the first fused clothes image, the first human body image and the second human body image to obtain a target dressing image.

The virtual fitting device provided by the embodiment of the application can realize each process realized by fig. 6, and is not repeated here for avoiding repetition.

In an embodiment of the present application, there is provided a virtual fitting apparatus including: an acquisition unit configured to acquire a first human body image and a first garment image; acquiring human body image analysis information according to the first human body image; the deformation unit is used for analyzing the information and the first clothes image according to the human body image and processing the information through the appearance flow deformation module to generate first deformation information; acquiring a second clothes image, and deforming the second clothes image through the first deformation information to generate a first deformed clothes image; the fusion unit is used for acquiring a first hand image according to the first human body image, wherein the first hand image comprises hand information and arm information; fusing the first deformed clothes image, the first hand image and the first human body image to determine a first fused clothes image; acquiring a second human body image according to the first human body image, wherein the second human body image is obtained by removing clothes information and arm information from the first human body image; and fusing the first deformed clothes image, the first fused clothes image, the first human body image and the second human body image to obtain a target dressing image. The first hand image and the second human body image are obtained by utilizing the first human body image, the first deformed clothes image, the first hand image and the first human body image are utilized to determine a first fused clothes image, and the first deformed clothes image, the first deformed clothes image and the second human body image are fused to obtain a target dressing image.

EXAMPLE III

Referring to fig. 9, fig. 9 is a schematic flow chart of a virtual fitting method according to a third embodiment of the present application;

As shown in fig. 9, the virtual fitting method includes:

step S901: acquiring a first target human body image and a first target clothes image, wherein the first target human body image and the first target clothes image are both high-resolution images;

specifically, the first target human body image is an image of a human body wearing clothes, that is, the first target human body image includes person information and clothes information. It is understood that the first target human body image further includes a background portion, and the person in the first human body image may have various postures or postures, such as a waist, natural sagging of both hands, and the like, which is not limited by the embodiment of the present application.

In an embodiment of the present application, the first target human body image and the first target clothes image are both high resolution images, including: high Definition image (High Definition), for example: the resolutions of the first target human body image and the first target clothes image are 512 × 384, 1024 × 768, 1280 × 720 or 1920 × 1080. It is to be understood that the resolution of the first target human body image and the first target clothes image in the embodiment of the present application may also be other resolutions, and is not limited herein.

Step S902: zooming the first target human body image and the first target clothes image to generate a second target human body image and a second target clothes image;

specifically, the first target human body image and the first target clothes image are scaled to obtain the low-resolution image, that is, the high-resolution first target human body image is scaled to the low-resolution second target human body image, and the high-resolution first target clothes image is scaled to the low-resolution second target clothes image, for example: the second target human body image and the second target clothes image are both 256 × 192. It is to be understood that the resolution of the second target human body image is lower than that of the first target human body image, and the resolution of the second target clothes image is lower than that of the first target clothes image, and the resolutions of the second target human body image and the second target clothes image in the embodiment of the present application may also be other resolutions, which is not limited herein.

Step S903: generating a first target fusion image according to the second target human body image and the second target clothes image;

specifically, generating a first target fusion image according to the second target human body image and the second target clothes image includes:

Specifically, the step of inputting the second target human body image and the second target clothes image into the appearance flow deformation module and the generation module to generate the first target fusion image includes:

Referring to fig. 10, fig. 10 is a schematic diagram illustrating a principle of generating a first target fusion image according to a third embodiment of the present application;

as shown in fig. 10, the apparent flow deformation module includes: a parser-free appearance flow warping module (PF-AFWM), the generating module comprising: a parser-free generating module (PF-GM) for scaling (resize) the first target body image and the first target clothing image to generate a second target body image and a second target clothing image, inputting the second target body image and the second target clothing image to a parser-free appearance flow morphing module (PF-AFWM) to obtain second morphing information (Sf), morphing (warp) the second target clothing image (Ic) with the second morphing information (Sf) to generate a third target clothing image (Sw), and inputting the second target body image and the third target clothing image (Sw) to a parser-free generating module (PF-GM) to obtain a first target fused image (output 1).

In an embodiment of the present application, the apparent flow deformation module includes: a first appearance flow deformation module and a second appearance flow deformation module, wherein the first appearance flow deformation module comprises a parser appearance flow deformation module (PB-AFWM), and the second appearance flow deformation module comprises: a parser-free appearance flow warping module (PF-AFWM) for generating second warping information, which is a set of two-dimensional coordinate vectors, each vector representing which pixels in the clothing image should be used to fill in a particular pixel in the character image.

Wherein, the method also comprises:

training a second appearance flow deformation module specifically comprises:

acquiring a first human body image and a first clothes image;

Wherein, the method also comprises:

Specifically, please refer to fig. 11, fig. 11 is a schematic diagram illustrating a principle of training a second appearance flow transformation module according to a third embodiment of the present application;

as shown in FIG. 11, a first human body image (I) and a first clothes image (I) are acquired_c) Acquiring human body image analysis information (p) from the first human body image^*) The human body image analysis information comprises: human body mask image, human body dense point segmentation image and human body key point image, and analyzing information (p) according to human body image^*) And a first garment image (I)_c) The first apparent flow deformation module is used for processing to generate first deformation information (u)_f) Wherein the first apparent flow deformation module comprises: there is a parser-based appearance flow warping module (PB-AFWM),by the first deformation information (u)_f) For the second clothes image

Deforming (warp) to obtain a first deformed garment image

Fusing the first deformed clothes image by a parser generating module (PB-GM)

And human body image analysis information (p)^*) Obtaining a first fitting image

And based on the first fitting image

And a first garment image (I)_c) Training a second appearance flow deformation module, wherein the second appearance flow deformation module comprises: there is no parser appearance flow warping module (PF-AFWM).

Specifically, during the training process of the analyzer-free apparent flow deformation module (PF-AFWM), knowledge distillation (knowledge distillation) is performed by the analyzer-containing apparent flow deformation module (PB-AFWM), for example: tunable intellectual distillation (adjustable knowledge distillation).

Step S904: generating a first target processing image according to the first target human body image, wherein the first target processing image does not comprise clothes information and arm information;

specifically, clothes information and arm information in a first target human body image are identified through a neural network, and the clothes information and the arm information in the first target human body image are removed to obtain a first target processing image, wherein the neural network comprises a convolutional neural network; or acquiring the first target human body image, processing to obtain a human body mask image corresponding to the first target human body image, and removing the clothes information and the arm information in the first target human body image by using the human body mask image to obtain a first target processed image.

Step S905: and inputting the first target fusion image, the first target clothes image and the first target processing image into the first clothes model to generate a target dressing image.

Specifically, the first clothes model is a pre-trained model, and is configured to fuse an input first target fusion image, a first target clothes image, and a first target processing image to obtain a target dressing image, where the target dressing image is a final virtual dressing image, and is used to be presented to a terminal device of a user, for example: a mobile terminal or a fixed terminal.

In an embodiment of the present application, the method further includes:

pre-training a first garment model, specifically comprising:

Specifically, please refer to fig. 12 again, fig. 12 is a schematic diagram illustrating a principle of training a first garment model according to a third embodiment of the present application;

first, a first data set (data1) is acquired, wherein the first data set comprises a first human body image and a first clothes image, wherein the first human body image and the first clothes image are both high-resolution images, and the first data set further comprises a human body mask map;

and (3) scaling the images in the first data set (data1) to obtain a second data set (data2) with low resolution, for example: the resolution of the images in the first data set (data1) is 512 × 384, and the resolution of the images in the second data set (data2) obtained by processing is 256 × 192;

and generating a third data set according to the zoomed first human body image and the zoomed first clothes image in the second data set, wherein the third data set comprises the first fused image, and specifically, inputting the zoomed first human body image and the zoomed first clothes image in the second data set into a pre-trained analyzer-free appearance flow warping module (PF-AFWM) and a pre-trained analyzer-free generating module (PF-GM) to obtain a plurality of first fused images (output1), and combining the plurality of first fused images to form the third data set. For the specific processing procedure, reference may be made to the relevant description of the above embodiments and fig. 10 in the drawings of the specification, which are not repeated herein.

Then, generating a first processed image according to the first human body image and a human body mask image corresponding to the first human body image, wherein the first processed image does not include clothes information and arm information, namely the first processed image is an image without the clothes information and the arm information and is marked as (un-cloth);

the first fused image (output1) is then scaled (resize) to generate a second fused image (output2), wherein the second fused image (output2) has the same resolution as the first human body image, i.e. the first fused image (output1) is enlarged to the same resolution as the images in the first data set (data 1);

finally, a first human body image, a first processed image (un-cloth), and a second fused image (output2) are input, and a first clothing model (cloth-HD) for outputting a target dressing image of high resolution is trained. Specifically, the first clothing model (cloth-HD) may be trained using a common Unet structure generation model using a loss function, and the first human body image in the first data set (data1) may be trained as a learning object (label) during training. It will be appreciated that the loss function is a non-negative real function that quantifies the difference between the prediction tag and the true tag predicted by the model.

For example: the loss function is L ═ α × L1 loss function + β × -Perceptual loss function + γ × GAN loss function, i.e., L ═ α × L1 loss + β × -Perceptual loss + γ × GAN loss, where L1 loss function, i.e., Mean Absolute Error (MAE), is a loss function for the regression model. MAE is the sum of the absolute differences between the target variable and the predictor variable, so it measures the average error magnitude in a set of predictors, regardless of their direction; the perception loss function is human eye perception loss, the space calculated by the perception loss function is a feature space, the perception loss function trains a forward-propagating network to perform an image conversion task based on the high-level features provided by using a pre-trained network, and in the training process, the perception error measures the similarity between images and can run in real time during testing; the GAN loss function is a loss function for generating an anti-network, and the whole is a cross entropy loss function, wherein two neural networks and two loss functions exist in the training process, namely the generation network and the corresponding loss function are respectively used for judging the network and the corresponding loss function. In an embodiment of the present application, the GAN loss function comprises a relativic GAN loss function.

In the embodiment of the present application, α, β, and γ are all adjustable parameters, and the parameters are set according to specific requirements, for example: setting alpha to 0.1, beta to 0.1 and gamma to 2.

In an embodiment of the present application, there is provided a virtual fitting method, including: acquiring a first target human body image and a first target clothes image, wherein the first target human body image and the first target clothes image are both high-resolution images; zooming the first target human body image and the first target clothes image to generate a second target human body image and a second target clothes image; generating a first target fusion image according to the second target human body image and the second target clothes image; generating a first target processing image according to the first target human body image, wherein the first target processing image does not comprise clothes information and arm information; and inputting the first target fusion image, the first target clothes image and the first target processing image into the first clothes model to generate a target dressing image.

Referring to fig. 13, fig. 13 is a schematic structural diagram of a virtual fitting apparatus according to a third embodiment of the present application;

As shown in fig. 13, the virtual fitting apparatus 130 includes:

an obtaining unit 1301, configured to obtain a first target human body image and a first target clothes image, where the first target human body image and the first target clothes image are both high-resolution images;

a scaling unit 1302, configured to scale the first target human body image and the first target clothes image to generate a second target human body image and a second target clothes image;

a generating unit 1303, configured to generate a first target fusion image according to the second target human body image and the second target clothes image; generating a first target processing image according to the first target human body image, wherein the first target processing image does not comprise clothes information and arm information;

a fusion unit 1304, configured to input the first target fusion image, the first target clothing image, and the first target processing image into the first clothing model, and generate a target dressing image.

The virtual fitting device provided by the embodiment of the application can realize each process realized by fig. 9, and is not repeated here for avoiding repetition.

In an embodiment of the present application, there is provided a virtual fitting apparatus including: the system comprises an acquisition unit, a display unit and a control unit, wherein the acquisition unit is used for acquiring a first target human body image and a first target clothes image, and the first target human body image and the first target clothes image are both high-resolution images; the scaling unit is used for scaling the first target human body image and the first target clothes image to generate a second target human body image and a second target clothes image; the generating unit is used for generating a first target fusion image according to the second target human body image and the second target clothes image; generating a first target processing image according to the first target human body image, wherein the first target processing image does not comprise clothes information and arm information; and the fusion unit is used for inputting the first target fusion image, the first target clothes image and the first target processing image into the first clothes model to generate a target dressing image. The method comprises the steps of obtaining a first target human body image and a second target clothes image with high resolution, zooming to obtain the second target human body image and the second target clothes image to generate a first target fusion image, generating a first target processing image which does not comprise clothes information and arm information according to the first target human body image, inputting the first target fusion image, the first target clothes image and the first target processing image into a first clothes model, and generating a target dressing image.

Referring to fig. 14, fig. 14 is a schematic diagram of a hardware structure of a server according to an embodiment of the present application, and specifically, as shown in fig. 14, the server 140 includes at least one processor 1401 and a memory 1402 (in fig. 14, a bus connection and a processor are taken as an example) that are communicatively connected.

Processor 1401 is configured to provide computing and control capabilities to control server 140 to perform corresponding tasks, for example, control server 140 to perform a virtual fitting method in any of the above-described method embodiments.

It is to be appreciated that Processor 1401 can be a general purpose Processor including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

The memory 1402, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the virtual fitting method in the embodiments of the present application. The processor 1401 may implement the virtual fitting method in any of the method embodiments described below by running non-transitory software programs, instructions, and modules stored in the memory 1402. In particular, the memory 1402 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device.

In some embodiments, the memory 1402 may also include memory located remotely from the processor, which may be connected to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

In some embodiments, the server 140 may further have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input and output, and the server 140 may further include other components for implementing the functions of the device, which is not described herein again.

Embodiments of the present application also provide a computer-readable storage medium, such as a memory, including program code, which is executable by a processor to perform the virtual fitting method in the above embodiments. For example, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CDROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

Embodiments of the present application also provide a computer program product including one or more program codes stored in a computer readable storage medium. The processor of the server reads the program code from the computer-readable storage medium, and the processor executes the program code to perform the method steps of the virtual fitting method provided in the above-described embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by hardware associated with program code, and the program may be stored in a computer readable storage medium, where the above mentioned storage medium may be a read-only memory, a magnetic or optical disk, etc.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a general hardware platform, and certainly can also be implemented by hardware. It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; within the context of the present application, features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the present application as described above, which are not provided in detail for the sake of brevity; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A virtual fitting method, comprising:

generating a first target processing image according to the first target human body image, wherein the first target processing image does not include clothes information and arm information;

and inputting the first target fusion image, the first target clothes image and the first target processing image into a first clothes model to generate a target dressing image.

2. The method of claim 1, further comprising: pre-training a first garment model, specifically comprising:

scaling a first human body image and a first clothing image in the first data set to generate a second data set, wherein the second data set comprises a scaled first human body image and a scaled first clothing image;

generating a first processed image from the first human body image, wherein the first processed image does not include clothing information and arm information;

scaling the first fused image to generate a second fused image, wherein the second fused image has the same resolution as the first human body image;

and inputting the first human body image, the first processing image and the second fusion image, and training a first clothes model.

3. The method according to claim 1, wherein generating a first target fusion image from the second target body image and the second target clothing image comprises:

4. The method according to claim 3, wherein said inputting the second target human body image and the second target clothing image to an appearance flow warping module and a generating module, generating a first target fusion image, comprises:

inputting the second target human body image and the second target clothes image into the appearance flow deformation module to generate second deformation information;

and inputting the second target human body image and the third target clothes image into the generating module to generate the first target fusion image.

5. The method of claim 4, wherein the apparent flow deformation module comprises: a first appearance flow warping module and a second appearance flow warping module, wherein the second appearance flow warping module is configured to generate the second warping information, the method further comprising:

training a second appearance flow deformation module specifically comprises:

acquiring a first human body image and a first clothes image;

and training the second appearance flow deformation module according to the first clothes image and the first fitting image.

6. The method of claim 5, further comprising:

7. The method of claim 5 or 6, wherein the first appearance flow morphing module comprises a resolver-less appearance flow morphing module and the second appearance flow morphing module comprises a resolver-less appearance flow morphing module.

8. A virtual fitting apparatus, comprising:

the generating unit is used for generating a first target fusion image according to the second target human body image and the second target clothes image; generating a first target processing image according to the first target human body image, wherein the first target processing image does not include clothes information and arm information;

and the fusion unit is used for inputting the first target fusion image, the first target clothes image and the first target processing image into a first clothes model to generate a target dressing image.

9. A server, comprising:

memory and one or more processors to execute one or more computer programs stored in the memory, the one or more processors, when executing the one or more computer programs, causing the electronic device to implement the method of any of claims 1-7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to carry out the method according to any one of claims 1-7.