CN112330580A - Method, device, computing equipment and medium for generating human body clothes fusion image - Google Patents
Method, device, computing equipment and medium for generating human body clothes fusion image Download PDFInfo
- Publication number
- CN112330580A CN112330580A CN202011192303.XA CN202011192303A CN112330580A CN 112330580 A CN112330580 A CN 112330580A CN 202011192303 A CN202011192303 A CN 202011192303A CN 112330580 A CN112330580 A CN 112330580A
- Authority
- CN
- China
- Prior art keywords
- image
- human body
- network
- segmentation map
- fusion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 109
- 238000000034 method Methods 0.000 title claims abstract description 57
- 230000011218 segmentation Effects 0.000 claims abstract description 146
- 238000013528 artificial neural network Methods 0.000 claims description 15
- 238000005070 sampling Methods 0.000 claims description 12
- 230000015654 memory Effects 0.000 claims description 9
- 238000004422 calculation algorithm Methods 0.000 claims description 7
- 238000012549 training Methods 0.000 claims description 6
- 230000007246 mechanism Effects 0.000 claims description 5
- 238000013473 artificial intelligence Methods 0.000 abstract description 6
- 238000004891 communication Methods 0.000 description 11
- 238000005516 engineering process Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 9
- 238000012545 processing Methods 0.000 description 9
- 230000000694 effects Effects 0.000 description 7
- 238000013527 convolutional neural network Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 210000004209 hair Anatomy 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000003936 working memory Effects 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 210000000887 face Anatomy 0.000 description 2
- 210000003423 ankle Anatomy 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 210000003467 cheek Anatomy 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 210000001508 eye Anatomy 0.000 description 1
- 210000002683 foot Anatomy 0.000 description 1
- 210000001061 forehead Anatomy 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 210000004247 hand Anatomy 0.000 description 1
- 210000003128 head Anatomy 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000003924 mental process Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 230000036544 posture Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 210000000707 wrist Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0641—Shopping interfaces
- G06Q30/0643—Graphical representation of items or shoppers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Multimedia (AREA)
- Accounting & Taxation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Finance (AREA)
- Human Computer Interaction (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Image Analysis (AREA)
Abstract
The disclosure provides a method, a device, computing equipment and a medium for generating a human body clothes fusion image, relates to the technical field of artificial intelligence, and particularly relates to computer vision. The method comprises the following steps: generating a first human body segmentation map according to the first image, wherein the first image is an image of a first human body wearing first clothes, and the first human body segmentation map identifies a human body part of the first human body in the first image; generating a second human body segmentation map according to the first image and a second image containing a second garment, wherein the second human body segmentation map identifies the human body part of the first human body after the second garment is worn; and inputting the first image, the second image, the first human body segmentation chart and the second human body segmentation chart into an image fusion network to generate a fusion image of the second clothes worn by the first human body.
Description
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to computer vision, and in particular, to a method, an apparatus, a computing device, and a medium for generating a human body clothing fusion image.
Background
Artificial intelligence is the subject of research that makes computers simulate some human mental processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), both at the hardware level and at the software level. The artificial intelligence hardware technology generally comprises technologies such as a sensor, a special artificial intelligence chip, cloud computing, distributed storage, big data processing and the like, and the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, machine learning/deep learning, a big data processing technology, a knowledge graph technology and the like.
With the development of virtual fitting or virtual changing, the problem of how to effectively generate a human body and clothes fusion image is receiving more and more attention. Virtual fitting requires that a target garment be synthesized to a target person to see the effect of the person wearing the garment. The traditional fusion method cannot solve the problem that clothes and human postures are inconsistent and cannot solve the problem of skin repairing caused by inconsistent sizes of the clothes and target person upper clothes.
Disclosure of Invention
According to one aspect of the present disclosure, a method of generating a fused image of a human body garment is disclosed. The method may include generating a first segmentation map from the first image. The first image is an image of a first person wearing a first garment. The first body segmentation map identifies a body part of a first body in the first image. The method may further include generating a second body segmentation map from the first image and a second image comprising a second garment, the second body segmentation map identifying a body part of the first body after wearing the second garment. The method may further include inputting the first image, the second image, the first body segmentation map and the second body segmentation map into an image fusion network, generating a fused image of the first body wearing the second garment.
According to another aspect of the present disclosure, an apparatus for generating a fused image of a human body garment is disclosed. The apparatus may comprise a first segmentation unit configured to generate a first body segmentation map from a first image, the first image being an image of a first body wearing a first garment, the first body segmentation map identifying different body parts of the first body in the first image. The apparatus may further comprise a second segmentation unit configured to generate a second body segmentation map from the first image and a second image comprising a second garment, the second body segmentation map identifying different body parts of the first body after wearing the second garment. The apparatus may further include an image fusion unit configured to generate a fused image of the first person wearing the second garment from the second image, the first body segmentation map and the second body segmentation map using an image fusion network.
According to another aspect of the disclosure, a computing device is disclosed that may include: a processor; and a memory storing a program comprising instructions which, when executed by the processor, cause the processor to perform the above-described method for generating a body-clothing-fusion image.
According to yet another aspect of the present disclosure, a computer-readable storage medium storing a program is disclosed, the program may include instructions which, when executed by a processor of a server, cause the server to perform the above-described method for generating a human body clothing fusion image.
Drawings
FIG. 1 illustrates a schematic diagram of an exemplary system in which various methods described herein may be implemented, according to an embodiment of the present disclosure;
FIG. 2 is a flow chart of a method of generating a body clothing fusion image according to an embodiment of the present disclosure;
3(a) -3 (e) are diagrams of some images to be processed and generated according to embodiments of the present disclosure;
FIG. 4 is a flow chart of a method for generating a fused image of a human body garment according to another embodiment of the present disclosure;
FIG. 5 is an example neural network structure diagram for generating a human segmentation map, according to an embodiment of the present disclosure;
FIG. 6 is an exemplary diagram of an image fusion network for generating a body clothing fusion image according to an embodiment of the present disclosure;
fig. 7 is a block diagram illustrating a structure of an apparatus for generating a fusion image of human clothing according to an embodiment of the present disclosure; and
FIG. 8 illustrates a block diagram of an exemplary server and client that can be used to implement embodiments of the present disclosure.
Detailed Description
In the present disclosure, unless otherwise specified, the use of the terms "first", "second", etc. to describe various elements is not intended to limit the positional relationship, the timing relationship, or the importance relationship of the elements, and such terms are used only to distinguish one element from another. In some examples, a first element and a second element may refer to the same instance of the element, and in some cases, based on the context, they may also refer to different instances.
The terminology used in the description of the various described examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, if the number of elements is not specifically limited, the elements may be one or more. Furthermore, the term "and/or" as used in this disclosure is intended to encompass any and all possible combinations of the listed items.
Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.
Fig. 1 illustrates a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented in accordance with embodiments of the present disclosure. Referring to fig. 1, the system 100 includes one or more client devices 101, 102, 103, 104, 105, and 106, a server 120, and one or more communication networks 110 coupling the one or more client devices to the server 120. Client devices 101, 102, 103, 104, 105, and 106 may be configured to execute one or more applications.
In an embodiment of the present disclosure, the server 120 may run one or more services or software applications that enable the method of generating a body-clothing fusion image according to the present disclosure.
In some embodiments, the server 120 may also provide other services or software applications that may include non-virtual environments and virtual environments. In certain embodiments, these services may be provided as web-based services or cloud services, for example, provided to users of client devices 101, 102, 103, 104, 105, and/or 106 under a software as a service (SaaS) model.
In the configuration shown in fig. 1, server 120 may include one or more components that implement the functions performed by server 120. These components may include software components, hardware components, or a combination thereof, which may be executed by one or more processors. A user operating a client device 101, 102, 103, 104, 105, and/or 106 may, in turn, utilize one or more client applications to interact with the server 120 to take advantage of the services provided by these components. It should be understood that a variety of different system configurations are possible, which may differ from system 100. Accordingly, fig. 1 is one example of a system for implementing the various methods described herein and is not intended to be limiting.
The user may use the client device 101, 102, 103, 104, 105, and/or 106 to enable generation of a body clothing fusion image. The client device may provide an interface that enables a user of the client device to interact with the client device. The client device may also output information to the user via the interface. Although fig. 1 depicts only six client devices, those skilled in the art will appreciate that any number of client devices may be supported by the present disclosure.
Network 110 may be any type of network known to those skilled in the art that may support data communications using any of a variety of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. By way of example only, one or more networks 110 may be a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (e.g., bluetooth, WIFI), and/or any combination of these and/or other networks.
The server 120 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. The server 120 may include one or more virtual machines running a virtual operating system, or other computing architecture involving virtualization (e.g., one or more flexible pools of logical storage that may be virtualized to maintain virtual storage for the server). In various embodiments, the server 120 may run one or more services or software applications that provide the functionality described below.
The computing system in server 120 may run one or more operating systems including any of the operating systems described above, as well as any commercially available server operating systems. The server 120 may also run any of a variety of additional server applications and/or middle tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, and the like.
In some implementations, the server 120 may include one or more applications to analyze and consolidate data feeds and/or event updates received from users of the client devices 101, 102, 103, 104, 105, and 106. Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of client devices 101, 102, 103, 104, 105, and 106.
The system 100 may also include one or more databases 130. In some embodiments, these databases may be used to store data and other information. For example, one or more of the databases 130 may be used to store information such as audio files and video files. The data store 130 may reside in various locations. For example, the data store used by the server 120 may be local to the server 120, or may be remote from the server 120 and may communicate with the server 120 via a network-based or dedicated connection. The data store 130 may be of different types. In certain embodiments, the data store used by the server 120 may be a database, such as a relational database. One or more of these databases may store, update, and retrieve data to and from the database in response to the command.
In some embodiments, one or more of the databases 130 may also be used by applications to store application data. The databases used by the application may be different types of databases, such as key-value stores, object stores, or regular stores supported by a file system.
The system 100 of fig. 1 may be configured and operated in various ways to enable application of the various methods and apparatus described in accordance with the present disclosure.
A flowchart of a method for generating a body-clothing fusion image according to an embodiment of the present disclosure is described below with reference to fig. 2. For example, using the method shown in FIG. 2, a fused image may be generated for the person in FIG. 3(a) and the jacket in FIG. 3 (b). This can be used in a scene such as virtual fitting.
At step S201, a first human segmentation map is generated from the first image. The first image is an image of a first person wearing a first garment. The first body segmentation map identifies a body part of a first body in the first image. The first human body here is a target human body for fusion.
At step S202, a second body segmentation map is generated from the first image and a second image containing a second garment. The second body segmentation map identifies a body part of the first body after wearing the second garment.
At step S203, the first image, the second image, the first human body segmentation map and the second human body segmentation map are input into an image fusion network, and a fusion image of the first human body wearing the second garment is generated.
Through the steps described with reference to fig. 2, the first human body segmentation drawing and the second human body segmentation drawing, namely the original human body segmentation drawing and the human body segmentation drawing fused with the target clothes, are generated firstly, and then the human body segmentation drawings capable of identifying the human body part are utilized to generate a final image result, so that the fusion position of the human body and the clothes is more accurate, and the image fusion effect is improved.
The above method steps are further described in conjunction with fig. 3(a) -3 (d). Fig. 3(a) is an example of a first image showing a first person wearing a dark pattern jacket (original clothes). The first human body is here the target human body for fusion. At step S201, a first human body segmentation map of a first human body wearing a first garment is obtained based on a first image of the first human body wearing the first garment. The obtained first human body segmentation map may be as shown in fig. 3 (c).
Fig. 3(b) is an example of a second image showing a white-dot coat as an example of a target garment, that is, a second garment. At step S202, a second body segmentation map after the first body wears the second garment is obtained based on the first image and the second image including the second garment. Fig. 3(d) is an example of a second human body segmentation chart showing the human body part segmentation result after the person of the deep letter pattern jacket wears the white wave dot pattern jacket in the above example.
Here, the order of steps S201 and S202 is not limited to this, and step S202 may be performed first and then step S201 may be performed, or both may be performed in parallel.
At step S203, the first image, the second image, the first human body segmentation map and the second human body segmentation map are input into an image fusion network, and a fusion image of the first human body wearing the second garment is generated. Fig. 3(e) shows an example of a fused image showing a first human body, that is, a target human body wearing a white-spotted coat representing target clothes.
The image fusion network may take the form of various neural networks trained to enable image fusion. For example, the image fusion network may be a convolutional neural network. According to some embodiments, the image convergence network may be a PixtoPix network, or also referred to as a Pix2Pix network or a Pixel2 pixl network. The pixtpix network may output the generated picture and the real picture for the discriminator to distinguish. Therefore, the adoption of the PixtoPix network architecture is beneficial to training the model and enhancing the reality of the generated pictures. According to some embodiments, the image fusion network is obtained by training as a producer side of a generative confrontation network (GAN). By generating a training against the network, the realism of the generative model can be increased.
According to some embodiments, the image convergence network is a convergence network of a PixtoPix network and a U-net network. In a U-net network, a symmetrical U-shaped structure containing a compression path and an expansion path may be used. A typical U-net network consists of a convolutional layer, a downsampling layer, an upsampling (deconvolution) layer, and an active layer. Corresponding upsampling and downsampling layers may be connected to each other. By adopting the U-net network, the detail information of different scales can be reserved, and the authenticity of hair, face details and the like in the generated fusion image is increased.
The first and second body segmentation maps can identify the body part, thereby realizing different generation at different body parts. The body segmentation map may identify different parts of the body, for example, with different numerical values, such as a face part having a value of "1", a neck part having a value of "2", a jacket part having a value of "3", an arm having a value of "4", and the like, and the present disclosure is not limited thereto. The body segmentation map may only need to distinguish between different desired regions, and does not show specific details or textures. The human segmentation map may function as a Mask (Mask) during the image fusion process. For example, each of the first and second body segmentation maps identifies at least one of the following body parts: face, neck, arms, hands, shoulders, torso, legs, feet. This facilitates detailed processing and image generation for different regions. According to some embodiments, each of the first and second body segmentation maps identifies a portion covered by clothing and a skin portion. The first and second body segmentation maps are capable of identifying garment portions and non-garment portions. Therefore, the method is beneficial to the image processing of clothes and skin segmentation line parts, so that the generated image details are richer and the effect is more real. According to different use scenes, the human body segmentation graph can also identify other parts, for example, when the target clothes to be fused is a hat or jewelry, the human body segmentation graph can also carry out detailed identification on the face parts, such as distinguishing hair, forehead, cheek, ears, eyes and the like.
According to some embodiments, the image fusion network employs an attentive mechanism (attention) in the upsampling layer. The attention mechanism is realized by an attention function or an attention model, by learning the importance of each element from the sequence, merging the elements by importance, and learning the attention weight for the attention region. The attention mechanism can put more attention resources on the target area with important attention, so as to obtain more detailed information of the target needing attention and suppress other useless information. By adopting the attention mechanism in image fusion, attention can be focused on a specific area to be recognized, and particularly, skin near a boundary area such as an arm or a neck can be favorably generated.
According to some embodiments, generating the fused image of the first person wearing the second garment further comprises: and generating a preliminary fusion image and a mask by using the second image, the first human body segmentation map and the second human body segmentation map, and performing alpha fusion on the preliminary fusion image and the first image through the mask to generate a fusion image. Alpha fusion is a process of superimposing the foreground onto the background through transparency. The preliminary fused image may be a three-channel image. The transparency mask may be referred to as an alpha mask. The use of mask and alpha fusion can further increase the authenticity of the details of the generated picture.
According to some embodiments, the second image is an image of a second garment worn on a second person. The second image can be not only a single image but also an image on a human body, so that the application scene is wider, for example, the application scene is not limited to a virtual fitting scene, and the generation of an image for changing clothes by two persons can be realized.
According to some embodiments, inputting the first image and the first segmentation map into the image fusion network further comprises: removing the area where the first clothing is located from the first human body segmentation graph to generate a first human body segmentation graph without clothing information; and inputting the generated first human body segmentation map without the clothing information into an image fusion network. Therefore, the interference clothes area can be removed before the neural image fusion network is input, the calculated amount and the interference are reduced, and a more accurate fusion result is obtained.
According to some embodiments, inputting the second image, the first human segmentation map and the second human segmentation map into the image fusion network comprises: and inputting the second image and the first human body segmentation chart into the image fusion network in a down-sampling stage, and inputting the second human body segmentation chart into the image fusion network in an up-sampling stage. The second human body segmentation graph is input in the lower sampling layer, so that segmentation information of the position where the human body and the clothes are fused can be provided, and the fusion result is more accurate.
According to some embodiments, the second human segmentation map is input to the image fusion network by a SPADE algorithm. The SPADE algorithm is capable of generating corresponding actual images based on semantic segmentation maps. By combining the SPADE algorithm with the generated human body segmentation map, segmentation information of the position where the human body and the clothes are fused can be provided, so that an accurate and real fused image is generated.
According to some embodiments, the image fusion network has a plurality of upsampling layers, and wherein inputting the second human segmentation map into the image fusion network comprises: and between each adjacent upper sampling layer, inputting the second human body segmentation map into the image fusion network through a SPADE algorithm. By fusing once after each decoding, the realism of the generated image can be increased.
According to some embodiments, generating the first human segmentation map comprises: a first human segmentation map is generated from the first image using a first neural network that is different from the image fusion network. By employing a neural network, a human segmentation map can be efficiently and accurately generated, thereby enabling efficient generation of a final fusion image.
According to some embodiments, generating the second body segmentation map comprises: a second human segmentation map is generated from the first image and the second image using a second neural network different from the image fusion network. Any neural network trained to generate images using separate target body and target clothing to identify the body part after the body is fused with clothing may be used to generate the second body segmentation map. By utilizing the neural network, the images of the target human body and the target clothes can be accurately utilized to generate the human body segmentation region after the clothes and the human body are fused, so that a more accurate fusion position is provided for generating a final picture.
Fig. 4 is a flowchart of a method for generating a body-clothing fusion image according to another embodiment of the present disclosure. For example, a fused image may be generated for the person in fig. 3(a) and the jacket in fig. 3(b) using the method shown in fig. 4.
At step S401, a human body segmentation map of a first human body wearing a first garment is obtained for distinction, called a first human body segmentation map, based on a first image of the first human body wearing the first garment. Fig. 3(a) is an example of a first image showing a first person wearing a dark pattern jacket (original clothes). The first human body is here the target human body for fusion. The obtained first human body segmentation map may be as shown in fig. 3 (c).
Step S401 may be implemented using any segmentation model or CNN. The obtained first human body segmentation map may be a code map identifying the regional relationship of the human body parts. For example, different regions of the human body may be represented in the first human body segmentation map with different numerical values. The different regions may include the face, neck, arms, coat, etc.
At step S402, a body segmentation map, referred to herein as a second body segmentation map, is obtained after the first body wears the second garment based on the first image and the second image including the second garment. Fig. 3(b) is an example of a second image showing a white-dot coat as an example of a target garment, that is, a second garment. The second image may also be an image of a separate garment rather than being worn on the person. The second body segmentation map is in a format similar to the first body segmentation map, but identifies the body part segmentation map that is exhibited when the second garment is fused with the first body, such as when the second garment is worn on or over the first body. For example, continuing the above example, according to step S402, a human body segmentation chart is obtained after a person wearing a coat with a dark letter pattern as shown in fig. 3(d) wears a coat with a white dot pattern. The first garment and the second garment may differ in shape, style, color, etc. For example, in addition to the pattern and color differences, it can be seen that the sleeve length of the first garment is shorter than the second garment.
The second body segmentation map may be implemented by a neural network similar to that of fig. 5. For example, the CNN may be trained so that a human body segmentation map of a target human body wearing a target garment can be output after inputting human body key points, the human body segmentation map, and garment artwork. The neural network of fig. 5 is a two-way U-net CNN network, but the present disclosure is not limited thereto, and any method or model that can achieve generation of human body part segmentation results for clothing and a human body may be applied thereto.
For example, the second body segmentation map may be implemented by a model including a feature matching neural network. The model may include two inputs, the first input receiving the target garment image and extracting the target garment features. And the second path of input receives the target human body image and extracts the target human body characteristics. After calculation such as convolution and upsampling, a region segmentation image obtained after combination of the target human body and the target clothes can be obtained.
The combined segmented region image may be an effect map of the target person 'wearing' the target garment. For example, on the one hand, the part features of the target human body, such as the parts of the head, neck, shoulder, arm, etc., and the positions of the parts in the target human body image, etc., can be obtained by extracting the human body key points and the human body segmentation map of the target human body. On the other hand, the style characteristics of the target garment, such as long-sleeve or short-sleeve, round-neck or V-neck, and the position of the collar, cuff, hem, etc. of the target garment in the image of the target garment, may be extracted. Based on the extracted features, the target clothing is combined with the target human body to obtain masks (masks) of the respective portions of the target human body covered by the target clothing, as shown by the output on the right side in fig. 5. Mask corresponds to the hatched portion on the target human body on the right side of fig. 5. In this embodiment, Mask may be used as a region segmentation image after the combination of the target human body and the target clothes.
Here, the order of steps S401 and S402 is not limited to this, and step S402 may be performed first and then step S401 may be performed, or both may be run in parallel.
At step S403, an image fusion network is employed. And inputting the first image, the second image and the first human body segmentation graph into an image fusion network at a down sampling layer, and inputting the second human body segmentation graph into the image fusion network through an up sampling layer to generate a required target fusion image. An example of the generated target fusion image may be as shown in fig. 3(e), in which a fusion image of a first human body, that is, a target human body wearing a white-wave-point jacket representing target clothes is shown. The image fusion network may be a neural network. The image fusion network may be a CNN. The image fusion network may employ the pixtpix infrastructure. An area where the first clothing is removed from the image of the person of the first clothing may be obtained through the first human body segmentation map and the first image and used as an input of the neural network. This step can also be done in the image fusion network. A SPADE layer may be added to the upsampling layer and a second human segmentation map is input through the SPADE layer. The second body segmentation map may be used as position information.
The image fusion network may further employ a fusion architecture of pixtpix and Unet networks (not shown). For example, an image fusion network may have multiple upsampling layers and multiple downsampling layers. The down-sampling layers may have a number corresponding to the up-sampling layers. Pictures are encoded from left to right in the downsampled layer and are progressively decoded from left to right in the upsampled layer. The corresponding upper and lower sampling layers are respectively connected. For example, the largest upsampling layer may be connected to the largest downsampling layer. Such a network structure has at least the advantage that details of faces, hairs, etc. that are processed without first being encoded and then decoded can be refined, making e.g. faces more clear.
Fig. 6 shows an example of an image fusion network 600. The image of the second garment, i.e., the target garment, is input to the image fusion network 600. In addition, the first image and the first segmentation map are input to the image fusion network 600 as a first input. Taking the first image and the first segmentation map as a second input, the input image fusion network 600 may include removing the region where the first garment is located from the first image according to the segmentation result of the first segmentation map, and inputting the generated map without the first garment into the image fusion network. For example, as shown in FIG. 6, the second input to the neural network 600 is an intermediate picture generated from the first picture of the first person, excluding the area where the dark letter pattern jacket is located. The first input and the second input may be encoded by downsampling layers to be input into the image fusion network 600.
The image fusion network may employ attention in the upsampling layer. The effect of attention here is to take the value of another pixel, e.g. a neighboring site, by learning the weights. For example, when the first garment is a long-sleeved jacket and the second garment is a short-sleeved jacket, the texture may be taken near the arm position to complement the generated skin near the arm. Such processing can make the effect of image generation more realistic. This can improve the effect of image generation, and particularly can solve the problem of skin generation at the positions of the arms, the neck, the wrists, the ankles, and the like.
As shown in fig. 6, the second segmentation map may be passed through the SPADE layer multiple input network to provide the required generation information at different pixel levels. Each input enables the second segmentation map to be fused with data in the upsampling layer, providing location information at what location to fuse the garment. Multiple SPADE layers may be fused once after each decoding.
Further, in addition to generating 3-channel pictures, the image fusion network may also generate a mask that marks the location information. Thereafter, a final picture is obtained by alpha-fusing the 3-channel image with the original image, for example, by multiplying a mask with clothing to obtain a texture, and then superimposing the texture on the 3-channel picture. Through such design, can guarantee trousers, first etc. position unchangeable.
The above network may be used as the generator side of the generative countermeasure network GAN. The final picture and the real picture can be input into a discriminator for judgment so as to further train and generate the model.
For example, referring to fig. 6, it can be seen that two pictures, respectively the true value and the output of the model, can be output through the pixtpix network. The truth picture may be generated by exchanging the first image and the second image twice. The two pictures can be distinguished through a distinguishing model of the GAN, so that the generation model is further optimized. Through the training, the generated image can be more real, and a more effective generated model can be obtained. The discrimination side of the GAN may be any discriminator that discriminates images in the prior art, and the present disclosure is not limited thereto.
A block diagram of an apparatus 700 for generating a fused image of human clothing according to the present disclosure is described below with reference to fig. 7.
The apparatus 700 may comprise a first segmentation unit 701, a second segmentation unit 702 and an image fusion unit 703. The first segmentation unit is used for generating a first human body segmentation map according to the first image. The first image is an image of a first human body wearing a first garment, and the first human body segmentation map identifies different human body parts of the first human body in the first image. The second segmentation unit is used for generating a second human body segmentation map according to the first image and a second image containing a second clothes. The second body segmentation map identifies different body parts of the first body after the second garment is worn. The image fusion unit is used for generating a fusion image of the second clothes worn by the first human body according to the second image, the first human body segmentation chart and the second human body segmentation chart by using an image fusion network.
According to some embodiments, the image fusion network is a PixtoPix network. According to some embodiments, the image convergence network is a convergence network of a PixtoPix network and a U-net network.
According to another aspect of the present disclosure, there is also provided a computing device, which may include: a processor; and a memory storing a program comprising instructions which, when executed by the processor, cause the processor to perform the above-described method of generating a fused image of a person's clothing.
According to yet another aspect of the present disclosure, there is also provided a computer-readable storage medium storing a program, which may include instructions that, when executed by a processor of a server, cause the server to perform the above-described method of generating a human body clothing fusion image.
Referring to fig. 8, a block diagram of a computing device 800, which may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described.
Software elements (programs) may be located in the working memory 814 including, but not limited to, an operating system 816, one or more application programs 818, drivers, and/or other data and code. Instructions for performing the above-described methods and steps may be included in one or more applications 818, and the above-described methods may be implemented by the instructions of the one or more applications 818 being read and executed by the processor 804. Executable code or source code for the instructions of the software elements (programs) may also be downloaded from a remote location.
It will also be appreciated that various modifications may be made in accordance with specific requirements. For example, customized hardware might also be used and/or particular elements might be implemented in hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. For example, some or all of the disclosed methods and apparatus may be implemented by programming hardware (e.g., programmable logic circuitry including Field Programmable Gate Arrays (FPGAs) and/or Programmable Logic Arrays (PLAs)) in an assembly language or hardware programming language such as VERILOG, VHDL, C + +, using logic and algorithms according to the present disclosure.
It should also be understood that the foregoing method may be implemented in a server-client mode. For example, a client may receive data input by a user and send the data to a server. The client may also receive data input by the user, perform part of the processing in the foregoing method, and transmit the data obtained by the processing to the server. The server may receive data from the client and perform the aforementioned method or another part of the aforementioned method and return the results of the execution to the client. The client may receive the results of the execution of the method from the server and may present them to the user, for example, through an output device. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computing devices and having a client-server relationship to each other. The server may be a server of a distributed system or a server incorporating a blockchain. The server can also be a cloud server, or an intelligent cloud computing server or an intelligent cloud host with artificial intelligence technology.
It should also be understood that the components of computing device 800 may be distributed across a network. For example, some processes may be performed using one processor while other processes may be performed by another processor that is remote from the one processor. Other components of the computing device 800 may also be similarly distributed. As such, computing device 800 may be interpreted as a distributed computing system that performs processing at multiple locations.
Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the above-described methods, systems and apparatus are merely exemplary embodiments or examples and that the scope of the present invention is not limited by these embodiments or examples, but only by the claims as issued and their equivalents. Various elements in the embodiments or examples may be omitted or may be replaced with equivalents thereof. Further, the steps may be performed in an order different from that described in the present disclosure. Further, various elements in the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced with equivalent elements that appear after the present disclosure.
Claims (20)
1. A method of generating a fused image of a body garment, comprising:
generating a first human body segmentation map according to a first image, wherein the first image is an image of a first human body with first clothes, and the first human body segmentation map identifies a human body part of the first human body in the first image;
generating a second human body segmentation map according to the first image and a second image containing a second garment, wherein the second human body segmentation map identifies the human body part of the first human body after the second garment is worn; and
inputting the first image, the second image, the first human body segmentation chart and the second human body segmentation chart into an image fusion network to generate a fusion image of the first human body wearing the second clothes.
2. The method of claim 1, wherein the image fusion network is a PixtoPix network.
3. The method of claim 1, wherein the image convergence network is a convergence network of a pixtpix network and a U-net network.
4. The method of claim 1, wherein the image fusion network is obtained by training on the generator side as a generative confrontation network.
5. The method of any of claims 1-4, wherein each of the first and second body segmentation maps identifies a portion covered by clothing and a skin portion.
6. The method of any of claims 1-4, wherein the image fusion network employs an attention mechanism in an upsampling layer.
7. The method of any of claims 1-4, wherein generating the fused image of the first person wearing the second garment further comprises:
generating a preliminary fused image and a mask using the second image, the first segmentation map and the second segmentation map, and
and performing alpha fusion on the preliminary fusion image and the first image through the mask to generate the fusion image.
8. The method of any of claims 1-4, wherein the second image is an image of the second garment being worn on a second person.
9. The method of claim 8, wherein inputting the first image and the first segmentation map into the image fusion network further comprises:
removing the area where the first clothing is located from the first human body segmentation graph to generate a first human body segmentation graph without clothing information; and
inputting the generated first human body segmentation map without clothing information into the image fusion network.
10. The method according to any one of claims 1-4, wherein inputting the second image, the first human segmentation map and the second human segmentation map into an image fusion network comprises:
inputting the second image, the first human segmentation map into the image fusion network at a down-sampling stage, and the second human segmentation map into the image fusion network at an up-sampling stage.
11. The method of claim 1, wherein the second human segmentation map is input to the image fusion network by a SPADE algorithm.
12. The method of claim 11, wherein the image fusion network has a plurality of upsampling layers, and wherein inputting the second human segmentation map into the image fusion network comprises: and inputting the second human body segmentation map into the image fusion network through a SPADE algorithm between each adjacent upper sampling layer.
13. The method of any of claims 1-4, wherein generating the first human segmentation map comprises: generating the first human segmentation map from the first image using a first neural network different from the image fusion network.
14. The method according to any one of claims 1-4, wherein generating the second body segmentation map comprises: generating the second body segmentation map from the first image and the second image using a second neural network different from the image fusion network.
15. An apparatus for generating a fused image of a body garment, comprising:
the first dividing unit is used for generating a first human body dividing map according to a first image, wherein the first image is an image of a first human body wearing first clothes, and the first human body dividing map identifies different human body parts of the first human body in the first image;
a second segmentation unit, configured to generate a second human body segmentation map according to the first image and a second image including a second garment, where the second human body segmentation map identifies different human body parts of the first human body after wearing the second garment; and
and the image fusion unit is used for generating a fusion image of the second clothes worn by the first person according to the second image, the first human body segmentation chart and the second human body segmentation chart by using an image fusion network.
16. The apparatus of claim 15, wherein the image fusion network is a PixtoPix network.
17. The apparatus of claim 15, wherein the image convergence network is a convergence network of a pixtpix network and a U-net network.
18. The apparatus of claim 15, wherein the image fusion network is obtained by training on a generator side as a generative confrontation network.
19. A computing device, comprising:
a processor; and
a memory storing a program comprising instructions that, when executed by the processor, cause the processor to perform the method of any of claims 1-14.
20. A computer readable storage medium storing a program, the program comprising instructions that when executed by a processor of a computing device cause the computing device to perform the method of any of claims 1-14.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011192303.XA CN112330580B (en) | 2020-10-30 | 2020-10-30 | Method, device, computing equipment and medium for generating human body clothing fusion image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011192303.XA CN112330580B (en) | 2020-10-30 | 2020-10-30 | Method, device, computing equipment and medium for generating human body clothing fusion image |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112330580A true CN112330580A (en) | 2021-02-05 |
CN112330580B CN112330580B (en) | 2024-08-13 |
Family
ID=74296898
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011192303.XA Active CN112330580B (en) | 2020-10-30 | 2020-10-30 | Method, device, computing equipment and medium for generating human body clothing fusion image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112330580B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113269072A (en) * | 2021-05-18 | 2021-08-17 | 咪咕文化科技有限公司 | Picture processing method, device, equipment and computer program |
CN114663552A (en) * | 2022-05-25 | 2022-06-24 | 武汉纺织大学 | Virtual fitting method based on 2D image |
CN114913388A (en) * | 2022-04-24 | 2022-08-16 | 深圳数联天下智能科技有限公司 | Method for training fitting model, method for generating fitting image and related device |
WO2023051244A1 (en) * | 2021-09-29 | 2023-04-06 | 北京字跳网络技术有限公司 | Image generation method and apparatus, device, and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110097564A (en) * | 2019-04-04 | 2019-08-06 | 平安科技(深圳)有限公司 | Image labeling method, device, computer equipment and storage medium based on multi-model fusion |
US20200143204A1 (en) * | 2018-11-01 | 2020-05-07 | International Business Machines Corporation | Image classification using a mask image and neural networks |
CN111275518A (en) * | 2020-01-15 | 2020-06-12 | 中山大学 | Video virtual fitting method and device based on mixed optical flow |
WO2020119311A1 (en) * | 2018-12-14 | 2020-06-18 | 深圳市商汤科技有限公司 | Neural network training method and image matching method and device |
CN111709874A (en) * | 2020-06-16 | 2020-09-25 | 北京百度网讯科技有限公司 | Image adjusting method and device, electronic equipment and storage medium |
CN111787242A (en) * | 2019-07-17 | 2020-10-16 | 北京京东尚科信息技术有限公司 | Method and apparatus for virtual fitting |
-
2020
- 2020-10-30 CN CN202011192303.XA patent/CN112330580B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200143204A1 (en) * | 2018-11-01 | 2020-05-07 | International Business Machines Corporation | Image classification using a mask image and neural networks |
WO2020119311A1 (en) * | 2018-12-14 | 2020-06-18 | 深圳市商汤科技有限公司 | Neural network training method and image matching method and device |
CN110097564A (en) * | 2019-04-04 | 2019-08-06 | 平安科技(深圳)有限公司 | Image labeling method, device, computer equipment and storage medium based on multi-model fusion |
CN111787242A (en) * | 2019-07-17 | 2020-10-16 | 北京京东尚科信息技术有限公司 | Method and apparatus for virtual fitting |
CN111275518A (en) * | 2020-01-15 | 2020-06-12 | 中山大学 | Video virtual fitting method and device based on mixed optical flow |
CN111709874A (en) * | 2020-06-16 | 2020-09-25 | 北京百度网讯科技有限公司 | Image adjusting method and device, electronic equipment and storage medium |
Non-Patent Citations (2)
Title |
---|
PAN HUO等: "A part-based and feature fusion method for clothing classification", 《DVANCES IN MULTIMEDIA INFORMATION PROCESSING》, 31 December 2016 (2016-12-31) * |
苏卓;喻春阳;: "基于2D图像变换的虚拟试衣算法", 计算机技术与发展, no. 02, 19 October 2017 (2017-10-19) * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113269072A (en) * | 2021-05-18 | 2021-08-17 | 咪咕文化科技有限公司 | Picture processing method, device, equipment and computer program |
CN113269072B (en) * | 2021-05-18 | 2024-06-07 | 咪咕文化科技有限公司 | Picture processing method, device, equipment and computer program |
WO2023051244A1 (en) * | 2021-09-29 | 2023-04-06 | 北京字跳网络技术有限公司 | Image generation method and apparatus, device, and storage medium |
CN114913388A (en) * | 2022-04-24 | 2022-08-16 | 深圳数联天下智能科技有限公司 | Method for training fitting model, method for generating fitting image and related device |
CN114913388B (en) * | 2022-04-24 | 2024-05-31 | 深圳数联天下智能科技有限公司 | Method for training fitting model, method for generating fitting image and related device |
CN114663552A (en) * | 2022-05-25 | 2022-06-24 | 武汉纺织大学 | Virtual fitting method based on 2D image |
CN114663552B (en) * | 2022-05-25 | 2022-08-16 | 武汉纺织大学 | Virtual fitting method based on 2D image |
Also Published As
Publication number | Publication date |
---|---|
CN112330580B (en) | 2024-08-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112330580B (en) | Method, device, computing equipment and medium for generating human body clothing fusion image | |
US11573641B2 (en) | Gesture recognition system and method of using same | |
CN111787242B (en) | Method and apparatus for virtual fitting | |
US20190213792A1 (en) | Providing Body-Anchored Mixed-Reality Experiences | |
CN114303120A (en) | Virtual keyboard | |
KR20180126561A (en) | Create an automated avatar | |
US11176723B2 (en) | Automated dance animation | |
KR102506738B1 (en) | snow texture inpainting | |
US11222455B2 (en) | Management of pseudorandom animation system | |
JP7421010B2 (en) | Information display method, device and storage medium | |
CN118247348A (en) | Method for determining pose of first wide-angle image, data processing system and non-transitory machine readable medium | |
CN116311519B (en) | Action recognition method, model training method and device | |
US20240331305A1 (en) | Virtual clothing changing method, apparatus, electronic device and readable medium | |
CN114550313B (en) | Image processing method, neural network, training method, training device and training medium thereof | |
CN116030185A (en) | Three-dimensional hairline generating method and model training method | |
CN114119935B (en) | Image processing method and device | |
EP3692511A2 (en) | Customizing appearance in mixed reality | |
CN116245998B (en) | Rendering map generation method and device, and model training method and device | |
CN115661375B (en) | Three-dimensional hair style generation method and device, electronic equipment and storage medium | |
CN114120412B (en) | Image processing method and device | |
CN114120448B (en) | Image processing method and device | |
KR20240128015A (en) | Real-time clothing exchange | |
CN114119154A (en) | Virtual makeup method and device | |
CN117422831B (en) | Three-dimensional eyebrow shape generating method and device, electronic equipment and storage medium | |
CN115423827B (en) | Image processing method, image processing device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20240715 Address after: Room 501, No. 16-2 Wanghai Road, Siming District, Xiamen City, Fujian Province 361000 Applicant after: Xiamen Wozhuan Technology Co.,Ltd. Country or region after: China Address before: 2 / F, baidu building, 10 Shangdi 10th Street, Haidian District, Beijing 100085 Applicant before: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd. Country or region before: China |
|
GR01 | Patent grant | ||
GR01 | Patent grant |