CN112330580A

CN112330580A - Method, device, computing equipment and medium for generating human body clothes fusion image

Info

Publication number: CN112330580A
Application number: CN202011192303.XA
Authority: CN
Inventors: 马明明; 洪智滨
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Xiamen Wozhuan Technology Co ltd
Priority date: 2020-10-30
Filing date: 2020-10-30
Publication date: 2021-02-05
Anticipated expiration: 2040-10-30
Also published as: CN112330580B

Abstract

The disclosure provides a method, a device, computing equipment and a medium for generating a human body clothes fusion image, relates to the technical field of artificial intelligence, and particularly relates to computer vision. The method comprises the following steps: generating a first human body segmentation map according to the first image, wherein the first image is an image of a first human body wearing first clothes, and the first human body segmentation map identifies a human body part of the first human body in the first image; generating a second human body segmentation map according to the first image and a second image containing a second garment, wherein the second human body segmentation map identifies the human body part of the first human body after the second garment is worn; and inputting the first image, the second image, the first human body segmentation chart and the second human body segmentation chart into an image fusion network to generate a fusion image of the second clothes worn by the first human body.

Description

Method, device, computing equipment and medium for generating human body clothes fusion image

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to computer vision, and in particular, to a method, an apparatus, a computing device, and a medium for generating a human body clothing fusion image.

Background

Artificial intelligence is the subject of research that makes computers simulate some human mental processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), both at the hardware level and at the software level. The artificial intelligence hardware technology generally comprises technologies such as a sensor, a special artificial intelligence chip, cloud computing, distributed storage, big data processing and the like, and the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, machine learning/deep learning, a big data processing technology, a knowledge graph technology and the like.

With the development of virtual fitting or virtual changing, the problem of how to effectively generate a human body and clothes fusion image is receiving more and more attention. Virtual fitting requires that a target garment be synthesized to a target person to see the effect of the person wearing the garment. The traditional fusion method cannot solve the problem that clothes and human postures are inconsistent and cannot solve the problem of skin repairing caused by inconsistent sizes of the clothes and target person upper clothes.

Disclosure of Invention

According to one aspect of the present disclosure, a method of generating a fused image of a human body garment is disclosed. The method may include generating a first segmentation map from the first image. The first image is an image of a first person wearing a first garment. The first body segmentation map identifies a body part of a first body in the first image. The method may further include generating a second body segmentation map from the first image and a second image comprising a second garment, the second body segmentation map identifying a body part of the first body after wearing the second garment. The method may further include inputting the first image, the second image, the first body segmentation map and the second body segmentation map into an image fusion network, generating a fused image of the first body wearing the second garment.

According to another aspect of the present disclosure, an apparatus for generating a fused image of a human body garment is disclosed. The apparatus may comprise a first segmentation unit configured to generate a first body segmentation map from a first image, the first image being an image of a first body wearing a first garment, the first body segmentation map identifying different body parts of the first body in the first image. The apparatus may further comprise a second segmentation unit configured to generate a second body segmentation map from the first image and a second image comprising a second garment, the second body segmentation map identifying different body parts of the first body after wearing the second garment. The apparatus may further include an image fusion unit configured to generate a fused image of the first person wearing the second garment from the second image, the first body segmentation map and the second body segmentation map using an image fusion network.

According to another aspect of the disclosure, a computing device is disclosed that may include: a processor; and a memory storing a program comprising instructions which, when executed by the processor, cause the processor to perform the above-described method for generating a body-clothing-fusion image.

According to yet another aspect of the present disclosure, a computer-readable storage medium storing a program is disclosed, the program may include instructions which, when executed by a processor of a server, cause the server to perform the above-described method for generating a human body clothing fusion image.

Drawings

FIG. 1 illustrates a schematic diagram of an exemplary system in which various methods described herein may be implemented, according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of a method of generating a body clothing fusion image according to an embodiment of the present disclosure;

3(a) -3 (e) are diagrams of some images to be processed and generated according to embodiments of the present disclosure;

FIG. 4 is a flow chart of a method for generating a fused image of a human body garment according to another embodiment of the present disclosure;

FIG. 5 is an example neural network structure diagram for generating a human segmentation map, according to an embodiment of the present disclosure;

FIG. 6 is an exemplary diagram of an image fusion network for generating a body clothing fusion image according to an embodiment of the present disclosure;

fig. 7 is a block diagram illustrating a structure of an apparatus for generating a fusion image of human clothing according to an embodiment of the present disclosure; and

FIG. 8 illustrates a block diagram of an exemplary server and client that can be used to implement embodiments of the present disclosure.

Detailed Description

In the present disclosure, unless otherwise specified, the use of the terms "first", "second", etc. to describe various elements is not intended to limit the positional relationship, the timing relationship, or the importance relationship of the elements, and such terms are used only to distinguish one element from another. In some examples, a first element and a second element may refer to the same instance of the element, and in some cases, based on the context, they may also refer to different instances.

The terminology used in the description of the various described examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, if the number of elements is not specifically limited, the elements may be one or more. Furthermore, the term "and/or" as used in this disclosure is intended to encompass any and all possible combinations of the listed items.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 1 illustrates a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented in accordance with embodiments of the present disclosure. Referring to fig. 1, the system 100 includes one or

more client devices

101, 102, 103, 104, 105, and 106, a server 120, and one or more communication networks 110 coupling the one or more client devices to the server 120.

Client devices

101, 102, 103, 104, 105, and 106 may be configured to execute one or more applications.

In an embodiment of the present disclosure, the server 120 may run one or more services or software applications that enable the method of generating a body-clothing fusion image according to the present disclosure.

In some embodiments, the server 120 may also provide other services or software applications that may include non-virtual environments and virtual environments. In certain embodiments, these services may be provided as web-based services or cloud services, for example, provided to users of

client devices

101, 102, 103, 104, 105, and/or 106 under a software as a service (SaaS) model.

In the configuration shown in fig. 1, server 120 may include one or more components that implement the functions performed by server 120. These components may include software components, hardware components, or a combination thereof, which may be executed by one or more processors. A user operating a

client device

101, 102, 103, 104, 105, and/or 106 may, in turn, utilize one or more client applications to interact with the server 120 to take advantage of the services provided by these components. It should be understood that a variety of different system configurations are possible, which may differ from system 100. Accordingly, fig. 1 is one example of a system for implementing the various methods described herein and is not intended to be limiting.

The user may use the

client device

101, 102, 103, 104, 105, and/or 106 to enable generation of a body clothing fusion image. The client device may provide an interface that enables a user of the client device to interact with the client device. The client device may also output information to the user via the interface. Although fig. 1 depicts only six client devices, those skilled in the art will appreciate that any number of client devices may be supported by the present disclosure.

Client devices

101, 102, 103, 104, 105, and/or 106 may include various types of computing systems, such as portable handheld devices, general purpose computers (such as personal computers and laptop computers), workstation computers, wearable devices, gaming systems, thin clients, various messaging devices, sensors or other sensing devices, and so forth. These computing devices may run various types and versions of software applications and operating systems, such as Microsoft Windows, Apple iOS, UNIX-like operating systems, Linux, or Linux-like operating systems (e.g., Google Chrome OS); or include various Mobile operating systems, such as Microsoft Windows Mobile OS, iOS, Windows Phone, Android. Portable handheld devices may include cellular telephones, smart phones, tablets, Personal Digital Assistants (PDAs), and the like. Wearable devices may include head mounted displays and other devices. The gaming system may include a variety of handheld gaming devices, internet-enabled gaming devices, and the like. The client device is capable of executing a variety of different applications, such as various Internet-related applications, communication applications (e.g., email applications), Short Message Service (SMS) applications, and may use a variety of communication protocols.

Network 110 may be any type of network known to those skilled in the art that may support data communications using any of a variety of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. By way of example only, one or more networks 110 may be a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (e.g., bluetooth, WIFI), and/or any combination of these and/or other networks.

The server 120 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. The server 120 may include one or more virtual machines running a virtual operating system, or other computing architecture involving virtualization (e.g., one or more flexible pools of logical storage that may be virtualized to maintain virtual storage for the server). In various embodiments, the server 120 may run one or more services or software applications that provide the functionality described below.

The computing system in server 120 may run one or more operating systems including any of the operating systems described above, as well as any commercially available server operating systems. The server 120 may also run any of a variety of additional server applications and/or middle tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, and the like.

In some implementations, the server 120 may include one or more applications to analyze and consolidate data feeds and/or event updates received from users of the

client devices

101, 102, 103, 104, 105, and 106. Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of

client devices

101, 102, 103, 104, 105, and 106.

The system 100 may also include one or more databases 130. In some embodiments, these databases may be used to store data and other information. For example, one or more of the databases 130 may be used to store information such as audio files and video files. The data store 130 may reside in various locations. For example, the data store used by the server 120 may be local to the server 120, or may be remote from the server 120 and may communicate with the server 120 via a network-based or dedicated connection. The data store 130 may be of different types. In certain embodiments, the data store used by the server 120 may be a database, such as a relational database. One or more of these databases may store, update, and retrieve data to and from the database in response to the command.

In some embodiments, one or more of the databases 130 may also be used by applications to store application data. The databases used by the application may be different types of databases, such as key-value stores, object stores, or regular stores supported by a file system.

The system 100 of fig. 1 may be configured and operated in various ways to enable application of the various methods and apparatus described in accordance with the present disclosure.

A flowchart of a method for generating a body-clothing fusion image according to an embodiment of the present disclosure is described below with reference to fig. 2. For example, using the method shown in FIG. 2, a fused image may be generated for the person in FIG. 3(a) and the jacket in FIG. 3 (b). This can be used in a scene such as virtual fitting.

At step S201, a first human segmentation map is generated from the first image. The first image is an image of a first person wearing a first garment. The first body segmentation map identifies a body part of a first body in the first image. The first human body here is a target human body for fusion.

At step S202, a second body segmentation map is generated from the first image and a second image containing a second garment. The second body segmentation map identifies a body part of the first body after wearing the second garment.

At step S203, the first image, the second image, the first human body segmentation map and the second human body segmentation map are input into an image fusion network, and a fusion image of the first human body wearing the second garment is generated.

Through the steps described with reference to fig. 2, the first human body segmentation drawing and the second human body segmentation drawing, namely the original human body segmentation drawing and the human body segmentation drawing fused with the target clothes, are generated firstly, and then the human body segmentation drawings capable of identifying the human body part are utilized to generate a final image result, so that the fusion position of the human body and the clothes is more accurate, and the image fusion effect is improved.

The above method steps are further described in conjunction with fig. 3(a) -3 (d). Fig. 3(a) is an example of a first image showing a first person wearing a dark pattern jacket (original clothes). The first human body is here the target human body for fusion. At step S201, a first human body segmentation map of a first human body wearing a first garment is obtained based on a first image of the first human body wearing the first garment. The obtained first human body segmentation map may be as shown in fig. 3 (c).

Fig. 3(b) is an example of a second image showing a white-dot coat as an example of a target garment, that is, a second garment. At step S202, a second body segmentation map after the first body wears the second garment is obtained based on the first image and the second image including the second garment. Fig. 3(d) is an example of a second human body segmentation chart showing the human body part segmentation result after the person of the deep letter pattern jacket wears the white wave dot pattern jacket in the above example.

Here, the order of steps S201 and S202 is not limited to this, and step S202 may be performed first and then step S201 may be performed, or both may be performed in parallel.

At step S203, the first image, the second image, the first human body segmentation map and the second human body segmentation map are input into an image fusion network, and a fusion image of the first human body wearing the second garment is generated. Fig. 3(e) shows an example of a fused image showing a first human body, that is, a target human body wearing a white-spotted coat representing target clothes.

The image fusion network may take the form of various neural networks trained to enable image fusion. For example, the image fusion network may be a convolutional neural network. According to some embodiments, the image convergence network may be a PixtoPix network, or also referred to as a Pix2Pix network or a Pixel2 pixl network. The pixtpix network may output the generated picture and the real picture for the discriminator to distinguish. Therefore, the adoption of the PixtoPix network architecture is beneficial to training the model and enhancing the reality of the generated pictures. According to some embodiments, the image fusion network is obtained by training as a producer side of a generative confrontation network (GAN). By generating a training against the network, the realism of the generative model can be increased.

According to some embodiments, the image convergence network is a convergence network of a PixtoPix network and a U-net network. In a U-net network, a symmetrical U-shaped structure containing a compression path and an expansion path may be used. A typical U-net network consists of a convolutional layer, a downsampling layer, an upsampling (deconvolution) layer, and an active layer. Corresponding upsampling and downsampling layers may be connected to each other. By adopting the U-net network, the detail information of different scales can be reserved, and the authenticity of hair, face details and the like in the generated fusion image is increased.

The first and second body segmentation maps can identify the body part, thereby realizing different generation at different body parts. The body segmentation map may identify different parts of the body, for example, with different numerical values, such as a face part having a value of "1", a neck part having a value of "2", a jacket part having a value of "3", an arm having a value of "4", and the like, and the present disclosure is not limited thereto. The body segmentation map may only need to distinguish between different desired regions, and does not show specific details or textures. The human segmentation map may function as a Mask (Mask) during the image fusion process. For example, each of the first and second body segmentation maps identifies at least one of the following body parts: face, neck, arms, hands, shoulders, torso, legs, feet. This facilitates detailed processing and image generation for different regions. According to some embodiments, each of the first and second body segmentation maps identifies a portion covered by clothing and a skin portion. The first and second body segmentation maps are capable of identifying garment portions and non-garment portions. Therefore, the method is beneficial to the image processing of clothes and skin segmentation line parts, so that the generated image details are richer and the effect is more real. According to different use scenes, the human body segmentation graph can also identify other parts, for example, when the target clothes to be fused is a hat or jewelry, the human body segmentation graph can also carry out detailed identification on the face parts, such as distinguishing hair, forehead, cheek, ears, eyes and the like.

According to some embodiments, the image fusion network employs an attentive mechanism (attention) in the upsampling layer. The attention mechanism is realized by an attention function or an attention model, by learning the importance of each element from the sequence, merging the elements by importance, and learning the attention weight for the attention region. The attention mechanism can put more attention resources on the target area with important attention, so as to obtain more detailed information of the target needing attention and suppress other useless information. By adopting the attention mechanism in image fusion, attention can be focused on a specific area to be recognized, and particularly, skin near a boundary area such as an arm or a neck can be favorably generated.

According to some embodiments, generating the fused image of the first person wearing the second garment further comprises: and generating a preliminary fusion image and a mask by using the second image, the first human body segmentation map and the second human body segmentation map, and performing alpha fusion on the preliminary fusion image and the first image through the mask to generate a fusion image. Alpha fusion is a process of superimposing the foreground onto the background through transparency. The preliminary fused image may be a three-channel image. The transparency mask may be referred to as an alpha mask. The use of mask and alpha fusion can further increase the authenticity of the details of the generated picture.

According to some embodiments, the second image is an image of a second garment worn on a second person. The second image can be not only a single image but also an image on a human body, so that the application scene is wider, for example, the application scene is not limited to a virtual fitting scene, and the generation of an image for changing clothes by two persons can be realized.

According to some embodiments, inputting the first image and the first segmentation map into the image fusion network further comprises: removing the area where the first clothing is located from the first human body segmentation graph to generate a first human body segmentation graph without clothing information; and inputting the generated first human body segmentation map without the clothing information into an image fusion network. Therefore, the interference clothes area can be removed before the neural image fusion network is input, the calculated amount and the interference are reduced, and a more accurate fusion result is obtained.

According to some embodiments, inputting the second image, the first human segmentation map and the second human segmentation map into the image fusion network comprises: and inputting the second image and the first human body segmentation chart into the image fusion network in a down-sampling stage, and inputting the second human body segmentation chart into the image fusion network in an up-sampling stage. The second human body segmentation graph is input in the lower sampling layer, so that segmentation information of the position where the human body and the clothes are fused can be provided, and the fusion result is more accurate.

According to some embodiments, the second human segmentation map is input to the image fusion network by a SPADE algorithm. The SPADE algorithm is capable of generating corresponding actual images based on semantic segmentation maps. By combining the SPADE algorithm with the generated human body segmentation map, segmentation information of the position where the human body and the clothes are fused can be provided, so that an accurate and real fused image is generated.

According to some embodiments, the image fusion network has a plurality of upsampling layers, and wherein inputting the second human segmentation map into the image fusion network comprises: and between each adjacent upper sampling layer, inputting the second human body segmentation map into the image fusion network through a SPADE algorithm. By fusing once after each decoding, the realism of the generated image can be increased.

According to some embodiments, generating the first human segmentation map comprises: a first human segmentation map is generated from the first image using a first neural network that is different from the image fusion network. By employing a neural network, a human segmentation map can be efficiently and accurately generated, thereby enabling efficient generation of a final fusion image.

According to some embodiments, generating the second body segmentation map comprises: a second human segmentation map is generated from the first image and the second image using a second neural network different from the image fusion network. Any neural network trained to generate images using separate target body and target clothing to identify the body part after the body is fused with clothing may be used to generate the second body segmentation map. By utilizing the neural network, the images of the target human body and the target clothes can be accurately utilized to generate the human body segmentation region after the clothes and the human body are fused, so that a more accurate fusion position is provided for generating a final picture.

Fig. 4 is a flowchart of a method for generating a body-clothing fusion image according to another embodiment of the present disclosure. For example, a fused image may be generated for the person in fig. 3(a) and the jacket in fig. 3(b) using the method shown in fig. 4.

At step S401, a human body segmentation map of a first human body wearing a first garment is obtained for distinction, called a first human body segmentation map, based on a first image of the first human body wearing the first garment. Fig. 3(a) is an example of a first image showing a first person wearing a dark pattern jacket (original clothes). The first human body is here the target human body for fusion. The obtained first human body segmentation map may be as shown in fig. 3 (c).

Step S401 may be implemented using any segmentation model or CNN. The obtained first human body segmentation map may be a code map identifying the regional relationship of the human body parts. For example, different regions of the human body may be represented in the first human body segmentation map with different numerical values. The different regions may include the face, neck, arms, coat, etc.

At step S402, a body segmentation map, referred to herein as a second body segmentation map, is obtained after the first body wears the second garment based on the first image and the second image including the second garment. Fig. 3(b) is an example of a second image showing a white-dot coat as an example of a target garment, that is, a second garment. The second image may also be an image of a separate garment rather than being worn on the person. The second body segmentation map is in a format similar to the first body segmentation map, but identifies the body part segmentation map that is exhibited when the second garment is fused with the first body, such as when the second garment is worn on or over the first body. For example, continuing the above example, according to step S402, a human body segmentation chart is obtained after a person wearing a coat with a dark letter pattern as shown in fig. 3(d) wears a coat with a white dot pattern. The first garment and the second garment may differ in shape, style, color, etc. For example, in addition to the pattern and color differences, it can be seen that the sleeve length of the first garment is shorter than the second garment.

The second body segmentation map may be implemented by a neural network similar to that of fig. 5. For example, the CNN may be trained so that a human body segmentation map of a target human body wearing a target garment can be output after inputting human body key points, the human body segmentation map, and garment artwork. The neural network of fig. 5 is a two-way U-net CNN network, but the present disclosure is not limited thereto, and any method or model that can achieve generation of human body part segmentation results for clothing and a human body may be applied thereto.

For example, the second body segmentation map may be implemented by a model including a feature matching neural network. The model may include two inputs, the first input receiving the target garment image and extracting the target garment features. And the second path of input receives the target human body image and extracts the target human body characteristics. After calculation such as convolution and upsampling, a region segmentation image obtained after combination of the target human body and the target clothes can be obtained.

The combined segmented region image may be an effect map of the target person 'wearing' the target garment. For example, on the one hand, the part features of the target human body, such as the parts of the head, neck, shoulder, arm, etc., and the positions of the parts in the target human body image, etc., can be obtained by extracting the human body key points and the human body segmentation map of the target human body. On the other hand, the style characteristics of the target garment, such as long-sleeve or short-sleeve, round-neck or V-neck, and the position of the collar, cuff, hem, etc. of the target garment in the image of the target garment, may be extracted. Based on the extracted features, the target clothing is combined with the target human body to obtain masks (masks) of the respective portions of the target human body covered by the target clothing, as shown by the output on the right side in fig. 5. Mask corresponds to the hatched portion on the target human body on the right side of fig. 5. In this embodiment, Mask may be used as a region segmentation image after the combination of the target human body and the target clothes.

Here, the order of steps S401 and S402 is not limited to this, and step S402 may be performed first and then step S401 may be performed, or both may be run in parallel.

At step S403, an image fusion network is employed. And inputting the first image, the second image and the first human body segmentation graph into an image fusion network at a down sampling layer, and inputting the second human body segmentation graph into the image fusion network through an up sampling layer to generate a required target fusion image. An example of the generated target fusion image may be as shown in fig. 3(e), in which a fusion image of a first human body, that is, a target human body wearing a white-wave-point jacket representing target clothes is shown. The image fusion network may be a neural network. The image fusion network may be a CNN. The image fusion network may employ the pixtpix infrastructure. An area where the first clothing is removed from the image of the person of the first clothing may be obtained through the first human body segmentation map and the first image and used as an input of the neural network. This step can also be done in the image fusion network. A SPADE layer may be added to the upsampling layer and a second human segmentation map is input through the SPADE layer. The second body segmentation map may be used as position information.

The image fusion network may further employ a fusion architecture of pixtpix and Unet networks (not shown). For example, an image fusion network may have multiple upsampling layers and multiple downsampling layers. The down-sampling layers may have a number corresponding to the up-sampling layers. Pictures are encoded from left to right in the downsampled layer and are progressively decoded from left to right in the upsampled layer. The corresponding upper and lower sampling layers are respectively connected. For example, the largest upsampling layer may be connected to the largest downsampling layer. Such a network structure has at least the advantage that details of faces, hairs, etc. that are processed without first being encoded and then decoded can be refined, making e.g. faces more clear.

Fig. 6 shows an example of an image fusion network 600. The image of the second garment, i.e., the target garment, is input to the image fusion network 600. In addition, the first image and the first segmentation map are input to the image fusion network 600 as a first input. Taking the first image and the first segmentation map as a second input, the input image fusion network 600 may include removing the region where the first garment is located from the first image according to the segmentation result of the first segmentation map, and inputting the generated map without the first garment into the image fusion network. For example, as shown in FIG. 6, the second input to the neural network 600 is an intermediate picture generated from the first picture of the first person, excluding the area where the dark letter pattern jacket is located. The first input and the second input may be encoded by downsampling layers to be input into the image fusion network 600.

The image fusion network may employ attention in the upsampling layer. The effect of attention here is to take the value of another pixel, e.g. a neighboring site, by learning the weights. For example, when the first garment is a long-sleeved jacket and the second garment is a short-sleeved jacket, the texture may be taken near the arm position to complement the generated skin near the arm. Such processing can make the effect of image generation more realistic. This can improve the effect of image generation, and particularly can solve the problem of skin generation at the positions of the arms, the neck, the wrists, the ankles, and the like.

As shown in fig. 6, the second segmentation map may be passed through the SPADE layer multiple input network to provide the required generation information at different pixel levels. Each input enables the second segmentation map to be fused with data in the upsampling layer, providing location information at what location to fuse the garment. Multiple SPADE layers may be fused once after each decoding.

Further, in addition to generating 3-channel pictures, the image fusion network may also generate a mask that marks the location information. Thereafter, a final picture is obtained by alpha-fusing the 3-channel image with the original image, for example, by multiplying a mask with clothing to obtain a texture, and then superimposing the texture on the 3-channel picture. Through such design, can guarantee trousers, first etc. position unchangeable.

The above network may be used as the generator side of the generative countermeasure network GAN. The final picture and the real picture can be input into a discriminator for judgment so as to further train and generate the model.

For example, referring to fig. 6, it can be seen that two pictures, respectively the true value and the output of the model, can be output through the pixtpix network. The truth picture may be generated by exchanging the first image and the second image twice. The two pictures can be distinguished through a distinguishing model of the GAN, so that the generation model is further optimized. Through the training, the generated image can be more real, and a more effective generated model can be obtained. The discrimination side of the GAN may be any discriminator that discriminates images in the prior art, and the present disclosure is not limited thereto.

A block diagram of an apparatus 700 for generating a fused image of human clothing according to the present disclosure is described below with reference to fig. 7.

The apparatus 700 may comprise a first segmentation unit 701, a second segmentation unit 702 and an image fusion unit 703. The first segmentation unit is used for generating a first human body segmentation map according to the first image. The first image is an image of a first human body wearing a first garment, and the first human body segmentation map identifies different human body parts of the first human body in the first image. The second segmentation unit is used for generating a second human body segmentation map according to the first image and a second image containing a second clothes. The second body segmentation map identifies different body parts of the first body after the second garment is worn. The image fusion unit is used for generating a fusion image of the second clothes worn by the first human body according to the second image, the first human body segmentation chart and the second human body segmentation chart by using an image fusion network.

According to some embodiments, the image fusion network is a PixtoPix network. According to some embodiments, the image convergence network is a convergence network of a PixtoPix network and a U-net network.

According to another aspect of the present disclosure, there is also provided a computing device, which may include: a processor; and a memory storing a program comprising instructions which, when executed by the processor, cause the processor to perform the above-described method of generating a fused image of a person's clothing.

According to yet another aspect of the present disclosure, there is also provided a computer-readable storage medium storing a program, which may include instructions that, when executed by a processor of a server, cause the server to perform the above-described method of generating a human body clothing fusion image.

Referring to fig. 8, a block diagram of a computing device 800, which may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described.

Computing device 800 may include components connected to bus 802 (possibly via one or more interfaces) or in communication with bus 802. For example, computing device 800 may include a bus 802, one or more processors 804, one or more input devices 806, and one or more output devices 808. The one or more processors 804 may be any type of processor and may include, but are not limited to, one or more general purpose processors and/or one or more special purpose processors (e.g., special processing chips). The processor 804 can process instructions for execution within the computing device 800, including instructions stored in or on a memory to display graphical information for a GUI on an external input/output apparatus (such as a display device coupled to an interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple computing devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). One processor 804 is illustrated in fig. 8.

Input device 806 may be any type of device capable of inputting information to computing device 800. The input device 806 may receive input numeric or character information and generate key signal inputs related to user settings and/or functional controls of the computing device used to generate the fused image of the body garment, and may include, but is not limited to, a mouse, a keyboard, a touch screen, a track pad, a track ball, a joystick, a microphone, and/or a remote control. Output device 808 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer.

Computing device 800 may also include or be connected with a non-transitory storage device 810, which may be any storage device that is non-transitory and that may enable data storage, and may include, but is not limited to, disk drives, optical storage devices, solid state memory, floppy disks, flexible disks, hard disks, magnetic tape, or any other magnetic medium, optical disks or any other optical medium, ROM (read only memory), RAM (random access memory), cache memory, and/or any other memory chip or cartridge, and/or any other medium from which a computer may read data, instructions, and/or code. The non-transitory storage device 810 may be removable from the interface. The non-transitory storage device 810 may have data/programs (including instructions)/codes/modules/units (e.g., the first segmentation unit 701, the second segmentation unit 702, and the image fusion unit 703 shown in fig. 7) for implementing the above-described methods and steps.

Computing device 800 may also include a communication device 812. The communication device 812 may be any type of device or system that enables communication with external devices and/or with a network, and may include, but is not limited to, a modem, a network card, an infrared communication device, a wireless communication device, and/or a chipset, such as a bluetooth (TM) device, 1302.11 device, a WiFi device, a WiMax device, a cellular communication device, and/or the like.

Computing device 800 may also include a working memory 814, which may be any type of working memory that can store programs (including instructions) and/or data useful for the operation of processor 804, and which may include, but is not limited to, random access memory and/or read only memory devices.

Software elements (programs) may be located in the working memory 814 including, but not limited to, an operating system 816, one or more application programs 818, drivers, and/or other data and code. Instructions for performing the above-described methods and steps may be included in one or more applications 818, and the above-described methods may be implemented by the instructions of the one or more applications 818 being read and executed by the processor 804. Executable code or source code for the instructions of the software elements (programs) may also be downloaded from a remote location.

It will also be appreciated that various modifications may be made in accordance with specific requirements. For example, customized hardware might also be used and/or particular elements might be implemented in hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. For example, some or all of the disclosed methods and apparatus may be implemented by programming hardware (e.g., programmable logic circuitry including Field Programmable Gate Arrays (FPGAs) and/or Programmable Logic Arrays (PLAs)) in an assembly language or hardware programming language such as VERILOG, VHDL, C + +, using logic and algorithms according to the present disclosure.

It should also be understood that the foregoing method may be implemented in a server-client mode. For example, a client may receive data input by a user and send the data to a server. The client may also receive data input by the user, perform part of the processing in the foregoing method, and transmit the data obtained by the processing to the server. The server may receive data from the client and perform the aforementioned method or another part of the aforementioned method and return the results of the execution to the client. The client may receive the results of the execution of the method from the server and may present them to the user, for example, through an output device. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computing devices and having a client-server relationship to each other. The server may be a server of a distributed system or a server incorporating a blockchain. The server can also be a cloud server, or an intelligent cloud computing server or an intelligent cloud host with artificial intelligence technology.

It should also be understood that the components of computing device 800 may be distributed across a network. For example, some processes may be performed using one processor while other processes may be performed by another processor that is remote from the one processor. Other components of the computing device 800 may also be similarly distributed. As such, computing device 800 may be interpreted as a distributed computing system that performs processing at multiple locations.

Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the above-described methods, systems and apparatus are merely exemplary embodiments or examples and that the scope of the present invention is not limited by these embodiments or examples, but only by the claims as issued and their equivalents. Various elements in the embodiments or examples may be omitted or may be replaced with equivalents thereof. Further, the steps may be performed in an order different from that described in the present disclosure. Further, various elements in the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced with equivalent elements that appear after the present disclosure.

Claims

1. A method of generating a fused image of a body garment, comprising:

generating a first human body segmentation map according to a first image, wherein the first image is an image of a first human body with first clothes, and the first human body segmentation map identifies a human body part of the first human body in the first image;

generating a second human body segmentation map according to the first image and a second image containing a second garment, wherein the second human body segmentation map identifies the human body part of the first human body after the second garment is worn; and

inputting the first image, the second image, the first human body segmentation chart and the second human body segmentation chart into an image fusion network to generate a fusion image of the first human body wearing the second clothes.

2. The method of claim 1, wherein the image fusion network is a PixtoPix network.

3. The method of claim 1, wherein the image convergence network is a convergence network of a pixtpix network and a U-net network.

4. The method of claim 1, wherein the image fusion network is obtained by training on the generator side as a generative confrontation network.

5. The method of any of claims 1-4, wherein each of the first and second body segmentation maps identifies a portion covered by clothing and a skin portion.

6. The method of any of claims 1-4, wherein the image fusion network employs an attention mechanism in an upsampling layer.

7. The method of any of claims 1-4, wherein generating the fused image of the first person wearing the second garment further comprises:

generating a preliminary fused image and a mask using the second image, the first segmentation map and the second segmentation map, and

and performing alpha fusion on the preliminary fusion image and the first image through the mask to generate the fusion image.

8. The method of any of claims 1-4, wherein the second image is an image of the second garment being worn on a second person.

9. The method of claim 8, wherein inputting the first image and the first segmentation map into the image fusion network further comprises:

removing the area where the first clothing is located from the first human body segmentation graph to generate a first human body segmentation graph without clothing information; and

inputting the generated first human body segmentation map without clothing information into the image fusion network.

10. The method according to any one of claims 1-4, wherein inputting the second image, the first human segmentation map and the second human segmentation map into an image fusion network comprises:

inputting the second image, the first human segmentation map into the image fusion network at a down-sampling stage, and the second human segmentation map into the image fusion network at an up-sampling stage.

11. The method of claim 1, wherein the second human segmentation map is input to the image fusion network by a SPADE algorithm.

12. The method of claim 11, wherein the image fusion network has a plurality of upsampling layers, and wherein inputting the second human segmentation map into the image fusion network comprises: and inputting the second human body segmentation map into the image fusion network through a SPADE algorithm between each adjacent upper sampling layer.

13. The method of any of claims 1-4, wherein generating the first human segmentation map comprises: generating the first human segmentation map from the first image using a first neural network different from the image fusion network.

14. The method according to any one of claims 1-4, wherein generating the second body segmentation map comprises: generating the second body segmentation map from the first image and the second image using a second neural network different from the image fusion network.

15. An apparatus for generating a fused image of a body garment, comprising:

the first dividing unit is used for generating a first human body dividing map according to a first image, wherein the first image is an image of a first human body wearing first clothes, and the first human body dividing map identifies different human body parts of the first human body in the first image;

a second segmentation unit, configured to generate a second human body segmentation map according to the first image and a second image including a second garment, where the second human body segmentation map identifies different human body parts of the first human body after wearing the second garment; and

and the image fusion unit is used for generating a fusion image of the second clothes worn by the first person according to the second image, the first human body segmentation chart and the second human body segmentation chart by using an image fusion network.

16. The apparatus of claim 15, wherein the image fusion network is a PixtoPix network.

17. The apparatus of claim 15, wherein the image convergence network is a convergence network of a pixtpix network and a U-net network.

18. The apparatus of claim 15, wherein the image fusion network is obtained by training on a generator side as a generative confrontation network.

19. A computing device, comprising:

a processor; and

a memory storing a program comprising instructions that, when executed by the processor, cause the processor to perform the method of any of claims 1-14.

20. A computer readable storage medium storing a program, the program comprising instructions that when executed by a processor of a computing device cause the computing device to perform the method of any of claims 1-14.