WO2023109666A1 - 虚拟换装方法、装置、电子设备及可读介质 - Google Patents

虚拟换装方法、装置、电子设备及可读介质 Download PDF

Info

Publication number
WO2023109666A1
WO2023109666A1 PCT/CN2022/137802 CN2022137802W WO2023109666A1 WO 2023109666 A1 WO2023109666 A1 WO 2023109666A1 CN 2022137802 W CN2022137802 W CN 2022137802W WO 2023109666 A1 WO2023109666 A1 WO 2023109666A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
information
clothing
network
sample
Prior art date
Application number
PCT/CN2022/137802
Other languages
English (en)
French (fr)
Inventor
董欣
张惜今
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2023109666A1 publication Critical patent/WO2023109666A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation

Definitions

  • Embodiments of the present disclosure relate to the technical field of image processing, for example, to a virtual dress-up method, device, electronic equipment, and readable medium.
  • the virtual dressing method in the related art is just a simple clothing replacement, that is, the clothing area in the specified image is segmented and transferred to the person in another image, and the clothing area may be simply scaled or scaled during the process. rotation etc.
  • the body shapes or poses of different characters are complex and diverse. Migrating the same clothing to people with different body shapes or poses will cause large errors, resulting in a mismatch between the clothing area and the character's pose, and the processing of the connecting parts of the character's body parts. Relatively rough, etc., with strong limitations.
  • the present disclosure provides a virtual dressing method, device, electronic equipment and readable medium, so as to improve the flexibility and accuracy of virtual dressing.
  • an embodiment of the present disclosure provides a method for virtual dressing, including:
  • the first posture information and the second posture information transform the clothing of the second character instance in the target image into the target clothing in the source image, and obtain Install images.
  • the embodiment of the present disclosure also provides a virtual dressing method, including:
  • the preset network is trained based on a sample image pair, the sample image pair includes a sample target image and a corresponding sample dressing image, and the sample dressing image is obtained according to the posture information of the sample target image.
  • the embodiment of the present disclosure also provides a virtual dressing device, including:
  • the first acquisition module is configured to acquire a source image and a target image, the source image includes the target clothing associated with the first character instance, and the target image includes the second character instance;
  • a processing module configured to process the source image and the target image to respectively obtain first portrait information and first pose information of the source image, and second pose information of the target image;
  • the first dressing module is configured to transform the clothing of the second character instance in the target image into the source according to the first portrait information, the first posture information and the second posture information The target clothing in the image, get the dress-up image.
  • the embodiment of the present disclosure also provides a virtual dressing device, including:
  • the second determining module is configured to acquire the target image and determine the target clothing
  • the second dressing module is configured to transform the clothing of the character instance in the target image into the target clothing through a preset network to obtain a dressing image.
  • an embodiment of the present disclosure further provides an electronic device, including:
  • processors one or more processors
  • a storage device configured to store one or more programs
  • the one or more programs are executed by the one or more processors, so that the one or more processors implement the virtual dress-up method provided by the embodiments of the present disclosure.
  • the embodiments of the present disclosure further provide a computer-readable medium, on which a computer program is stored, and when the program is executed by a processor, the virtual dress-up method provided by the embodiments of the present disclosure is implemented.
  • FIG. 1 is a schematic flow diagram of a virtual dress-up method provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart of a virtual dress-up method provided by another embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of an input branch of a segmentation auxiliary network provided by an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of an implementation of generating segmentation assistance information based on a segmentation assistance network according to an embodiment of the present disclosure
  • Fig. 5 is a schematic diagram of an input branch of a clothing displacement auxiliary network provided by an embodiment of the present disclosure
  • FIG. 6 is a schematic diagram of an implementation of generating a pixel displacement channel map based on a clothing displacement auxiliary network provided by an embodiment of the present disclosure
  • FIG. 7 is a schematic flowchart of a virtual dressing method provided by another embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram of background completion based on a background completion network provided by an embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of an implementation of generating a dress-up image based on a dress-up network according to an embodiment of the present disclosure.
  • FIG. 10 is a schematic flow diagram of a training and dressing network provided by an embodiment of the present disclosure.
  • FIG. 11 is a schematic diagram of an implementation of reconstructing a sample source image based on a reconstruction network provided by an embodiment of the present disclosure
  • FIG. 12 is a schematic diagram of an implementation of a training and dressing network provided by an embodiment of the present disclosure.
  • Fig. 13 is a schematic flowchart of a virtual dressing method provided by another embodiment of the present disclosure.
  • Fig. 14 is a schematic structural diagram of a virtual dressing device provided by an embodiment of the present disclosure.
  • Fig. 15 is a schematic structural diagram of a virtual dressing device provided by another embodiment of the present disclosure.
  • Fig. 16 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • the term “comprise” and its variations are open-ended, ie “including but not limited to”.
  • the term “based on” is “based at least in part on”.
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one further embodiment”; the term “some embodiments” means “at least some embodiments.” Relevant definitions of other terms will be given in the description below.
  • each embodiment provides example features and examples at the same time, multiple features recorded in the embodiments can be combined to form multiple example solutions, and each numbered embodiment should not be regarded as only as a technical solution.
  • the embodiments in the present disclosure and the features in the embodiments can be combined with each other if there is no conflict.
  • Fig. 1 is a schematic flow chart of a virtual dressing method provided by an embodiment of the present disclosure. This method can be applied to transform the target clothing according to the portrait information and posture information of different images, so as to realize the virtual change of clothes between different characters.
  • the method can be performed by a virtual dressing device, wherein the device can be implemented by software and/or hardware, and is generally integrated on an electronic device with image processing capabilities.
  • the electronic device includes but is not limited to: a desktop computer, Devices such as laptops, servers, tablets or smartphones.
  • a virtual dress-up method provided by an embodiment of the present disclosure includes the following steps:
  • the source image may refer to an image that provides the target clothing and includes the first character instance, for example, an image of a model wearing the target clothing, and the model is the first character instance.
  • the source image may be acquired through a preset database, wherein the preset database may refer to a database in which a large number of images of models wearing the same or different clothes are pre-stored.
  • the target image may refer to an image containing an instance of the second character to be changed. For example, a user who wants to change clothes may upload a photo of himself, and use the photo uploaded by the user as the acquired target image.
  • the character instance can be regarded as referring to the character object contained in the image.
  • the target clothing can be thought of as the clothing provided by the source image for transformation onto the person instance in the target image.
  • the source image and the target image can not only contain the characteristics of the corresponding character instance and its corresponding clothing (such as the model and user themselves and their clothing), but also contain the characteristics of the background where the corresponding character instance is located.
  • the first person instance may refer to a person object contained in the source image.
  • the second person instance may refer to a person object contained in the target image.
  • S120 Process the source image and the target image to obtain first portrait information and first pose information of the source image, and second pose information of the target image, respectively.
  • processing may refer to performing corresponding image processing operations on the source image and the target image to obtain corresponding portrait information and pose information.
  • instance segmentation or semantic segmentation can be performed on the extracted source image and target image, recognition of the pose of the person instance in the source image and the target image, extraction of the key points of the human body in the source image and the target image, and the source image and the target image Instances of characters are parsed to determine different body parts and clothing features, etc.
  • portrait information can refer to the image obtained after feature extraction of the character instance and clothing contained in the source image and the target image.
  • the main purpose is to determine the outline of the character instance, the exposed body parts and the body parts covered by clothing, etc. .
  • Portrait information may include portrait segmentation information and portrait analysis information.
  • the portrait segmentation information may refer to an image obtained by segmenting the image according to the overall outline of the person instance and clothing contained in the image, and the main purpose is to determine the outline of the person instance. It should be noted that, for the first person instance in the source image, the portrait segmentation information mainly refers to the outline of the first person instance wearing the target clothing, that is, the outline needs to be determined considering the outline of the target clothing.
  • Portrait analysis information can refer to the image obtained by contour segmentation and part analysis of the character instance and clothing contained in the image, such as analyzing and distinguishing the exposed body parts such as the limbs, head and feet of the character instance, and the clothing part The whole body, tops and bottoms are analyzed and distinguished; different parts in the portrait analysis information can be represented by different colors or textures.
  • Pose information can refer to the corresponding pixel point information obtained after performing human body joint point positioning processing on the person instances contained in the source image and the target image, and multiple pixel point information can constitute an image representing a pose.
  • the attitude information may include three-dimensional (3-Dimension, 3D) attitude information and two-dimensional (2-Dimension, 2D) attitude information.
  • 3D pose information may refer to an image representing the three-dimensional pose of a character instance formed by the 3D pixel point coordinates in the above-mentioned pixel point information, which may be represented by a schematic diagram of a human body with a corresponding pose;
  • 2D pose information may refer to an image composed of the above-mentioned pixel point information
  • the image representing the posture formed by the 2D pixel coordinate information can be represented by the key points or the connection lines of the key points conforming to the corresponding posture.
  • the first portrait information and the first pose information may be considered to refer to portrait information and pose information obtained after processing the source image, respectively.
  • the second pose information may be considered to refer to pose information obtained after processing the target image.
  • the specific method for processing the source image and the target image is not limited, and a corresponding image processing algorithm may be flexibly selected according to actual requirements.
  • the clothing of the second character instance may be considered to refer to the clothing worn by the character instance in the target image.
  • the principle of transformation is mainly to analyze the pose change between the first character instance and the second character instance according to the first portrait information, the first pose information and the second pose information, so as to obtain the target clothing associated with the first character instance Transform to the required displacement amount on the posture of the second character instance, correspondingly change the target clothing in the source image according to the above-mentioned required displacement amount, so as to replace the clothing of the second character instance in the target image
  • it can be fused to the pose of the second character instance to obtain an image of the target clothing in the pose of the second character instance.
  • the corresponding dressing image can be obtained.
  • the protection area of the second person instance can also be determined according to the second person portrait information of the target image.
  • protection The characteristics within the region remain unchanged.
  • the second portrait information may refer to portrait information obtained after processing the target image.
  • the protected area may refer to exposed body parts such as the face, hands, and feet of the second person instance in the target image.
  • the second person portrait information can be used to segment and analyze the second person instance, determine the protection area of the second person instance, and keep the features in the protection area unchanged, so as to ensure that the user can See exactly how the person being swapped will look in full on in the target outfit.
  • the second character instance after dressing can be displayed on a specified background, such as a white background, or can be blended into the original background of the target image.
  • a specified background such as a white background
  • operations such as cutout and background completion can be performed on the target image;
  • the original second person instance in the image is cut out, and then the background in the cutout target image is completed through the corresponding image processing method, and finally the second person instance after the dressing can be fused to the background-completed target image to get the corresponding dress-up image.
  • the change image is more real and accurate.
  • the source image and the target image are first obtained, the source image includes the target clothing associated with the first character instance, and the target image includes the second character instance; and then the source image and the target The image is processed to obtain the first portrait information and the first posture information of the source image, and the second posture information of the target image; finally, according to the first portrait information, the first posture information and the second posture information, the target image is The clothing of the second character instance in is transformed into the target clothing in the source image, and the dressing image is obtained.
  • the above method can realize the transformation of the target clothing between different characters by using the portrait information and posture information of the source image and the target image. Dressing, thus improving the flexibility and accuracy of virtual dressing.
  • FIG. 2 is a schematic flowchart of a virtual dress-up method provided by another embodiment of the present disclosure. This embodiment is refined based on the example solutions in the above-mentioned embodiments.
  • the clothing of the second character instance in the target image is transformed into the target clothing in the source image to obtain the dressing image described in detail. Please refer to the foregoing embodiments for details that are not exhaustive in this embodiment.
  • a virtual dress-up method provided by an embodiment of the present disclosure includes the following steps:
  • S220 Process the source image and the target image to obtain first portrait information and first pose information of the source image, and second pose information of the target image, respectively.
  • the segmentation auxiliary network may refer to a neural network for segmenting contour feature information of character instances and clothing in an image, such as a UNet network.
  • the segmentation auxiliary information may refer to the output obtained by the segmentation auxiliary network based on the first portrait information, the first pose information, and the second pose information, and is used to reflect the characteristics of the outline of the target clothing under the pose of the second character instance, the output result It can be represented by a mask (Mask), so the segmentation auxiliary network can also be understood as a Mask auxiliary network.
  • the auxiliary segmentation information may include portrait segmentation information and portrait analysis information of the target clothing in the pose of the second character instance.
  • the segmentation auxiliary information can be considered as the posture displacement change between the first and second character instances analyzed according to the first portrait information and the first posture information and the second posture information through the Mask auxiliary network , the obtained posture displacement change can make the outline of the target clothing be fused with the posture of the second character instance after corresponding displacement according to the above posture change displacement.
  • the portrait information and pose information obtained before have considered the character instance itself and the outline of the target clothing, so when the posture displacement changes, the outline of the target clothing can also be changed to be the same as the second The pose of the character instance is adapted.
  • the segmentation auxiliary network is a neural network including a double-branch input
  • the input of the first branch includes the first portrait information and the first posture information, wherein the first portrait information includes the first portrait segmentation information and the first person
  • the first pose information includes the 3D human body pose information and human key point information of the first character instance
  • the input of the second branch includes the second pose information
  • the second pose information includes the 3D human body pose information of the second character instance and key points of the human body.
  • the dual-branch input can be regarded as two input branches, that is, the segmentation auxiliary network can be a neural network including two input branches.
  • the first branch can be considered as the input branch corresponding to the source image, and the second branch can be considered as the input branch corresponding to the target image.
  • the first portrait segmentation information and the first portrait analysis information may respectively refer to portrait segmentation information and portrait analysis information obtained after processing the source image.
  • the three-dimensional human body posture information may refer to an image representing a three-dimensional posture obtained according to 3D pixel point coordinate information corresponding to the human body joint points of the character instance in the image.
  • the key point information of the human body may refer to an image representing a gesture formed according to 2D pixel point coordinate information corresponding to the joint points of the human body of the character instance in the image.
  • Each character instance can have corresponding 3D human body posture information and human body key point information.
  • the input of the first branch may include the first portrait information and the first pose information
  • the input of the second branch may include at least the second pose information, and also It may include second portrait information and second posture information, which is not limited here.
  • Fig. 3 is a schematic diagram of an input branch of a segmentation auxiliary network provided by an embodiment of the present disclosure.
  • the first input branch represents the input branch corresponding to the source image, where s1 and s2 represent the first portrait segmentation information and the first portrait analysis information, respectively, and s3 and s4 represent the three-dimensional Human body pose information and human body key point information;
  • the second input branch represents the input branch corresponding to the target image, where d1 and d2 represent the 3D human body pose information and human body key point information of the second character instance, respectively.
  • FIG. 4 is a schematic diagram of an implementation of generating segmentation assistance information based on a segmentation assistance network according to an embodiment of the present disclosure.
  • src represents the input branch corresponding to the source image, that is, the first input branch
  • dst represents the input branch corresponding to the target image, that is, the second input branch.
  • the encoder part can be used to extract the features of the information of the input branch and encode accordingly; the decoder part can be used to decode the encoded information processed by the residual block accordingly; the encoder and decoder parts include multiple different
  • the size of the convolution kernel such as usually can be 1 ⁇ 1, 3 ⁇ 3, 5 ⁇ 5 and 7 ⁇ 7, etc.
  • the residual block can be used to perform corresponding image processing on the information transmitted by the encoder.
  • first input branch in the figure and Respectively represent the first person portrait segmentation information and the first person portrait analysis information; and Respectively represent the three-dimensional human body pose information and the key point information of the human body of the first person instance.
  • D t and J t represent the 3D human body pose information and human key point information of the second character instance, respectively.
  • M t represents the portrait segmentation information of the target clothing under the posture of the second person instance
  • S t represents the segmentation information of the target clothing under the posture of the second person instance.
  • Portrait analysis information In the output result of the Mask auxiliary network, that is, in the segmentation auxiliary information, M t represents the portrait segmentation information of the target clothing under the posture of the second person instance, and S t represents the segmentation information of the target clothing under the posture of the second person instance.
  • the clothing displacement (Clothflow) auxiliary network may refer to a neural network for predicting displacement changes of clothing pixels, such as a UNet network.
  • the main principle is that by analyzing the posture displacement change between the first character instance and the second character instance, the target clothing can be fused with the posture of the second character instance in the target image by changing the pixel point displacement of the target clothing.
  • the pixel displacement channel map can refer to the output obtained by predicting the first portrait information, the first posture information, the second posture information and the segmentation auxiliary information through the Clothflow auxiliary network, which is used to reflect the second
  • the amount of displacement required for the pose of a character instance which can be in the form of a 2-channel image.
  • the 2-channel image can be considered as an image representing the displacement change of each pixel coordinate (x, y) of the target clothing.
  • the clothing displacement auxiliary network is a neural network including a double-branch input
  • the input of the first branch includes the first portrait information and the first posture information, wherein the first portrait information includes the first portrait segmentation information and the first Portrait analysis information, the first posture information includes the 3D human body posture information and human body key point information of the first character instance;
  • the input of the second branch includes segmentation auxiliary information and second posture information, where the segmentation auxiliary information includes the target clothing in The second portrait segmentation information and the second portrait analysis information under the posture of the second person instance, the second posture information includes the 3D human body posture information and the key point information of the human body of the second person instance.
  • the second portrait segmentation information and the second portrait analysis information may respectively refer to portrait segmentation information and portrait analysis information obtained after processing the target image.
  • Fig. 5 is a schematic diagram of an input branch of a clothing displacement auxiliary network provided by an embodiment of the present disclosure.
  • the first input branch represents the input branch corresponding to the source image
  • the second input branch represents the input branch corresponding to the target image.
  • the second portrait segmentation information and the second portrait parsing information are segmentation auxiliary information output by the Mask auxiliary module.
  • the key line information of the human body is a parametric human body model (Skinned Multi-Person Linear Model), which can be vividly understood as a posture information composed of key point connections in the key point information of the adult human body.
  • the key line information of the human body can also be added as input to improve the accuracy of human body posture analysis.
  • FIG. 6 is a schematic diagram of an implementation of generating a pixel displacement channel map based on a clothing displacement auxiliary network provided by an embodiment of the present disclosure.
  • the first portrait information and the first pose information are input to the first branch, and the output result of the Mask segmentation auxiliary network (that is, the segmentation auxiliary information) and the second pose information are input to the second branch, Through the corresponding processing of the Clothflow auxiliary network, the output result obtained is the pixel displacement channel map F p of the target clothing transformed from the first character instance to the second character instance.
  • the method for processing multiple input information by the Clothflow auxiliary network is not described in detail here.
  • the pose displacement channel map may refer to a 2-channel image used to reflect changes in pose displacement between the first and second character instances.
  • the posture information may include three-dimensional human body posture information, human body key point information and/or human body key line information.
  • the pose displacement channel map, pixel displacement channel map and segmentation auxiliary information the pose displacement change, the pixel displacement change of the target clothing and the characteristics of the target clothing under the pose of the second person instance can be integrated, and the The clothing of the second character instance is transformed into the target clothing in the source image to obtain a dress-up image, and this dress-up process can also be realized through a neural network.
  • a virtual dress-up method provided by an embodiment of the present disclosure embodies the process of obtaining a dress-up image according to portrait information and posture information.
  • this method through the segmentation auxiliary network and clothing displacement auxiliary network, the corresponding segmentation and displacement change prediction of portrait information and attitude information are carried out, and the clothing is changed according to the attitude displacement channel map, pixel displacement channel map and segmentation auxiliary information obtained after processing. Change, in order to realize the displacement change and posture fusion of the target clothing under the posture changes of different character instances, which improves the flexibility and accuracy of virtual dressing.
  • FIG. 7 is a schematic flow chart of a virtual dress-up method provided by another embodiment of the present disclosure. This embodiment is refined based on the example solutions in the above-mentioned embodiments.
  • the process of transforming the clothing of the second person instance in the target image into the target clothing in the source image according to the pose displacement channel map, pixel displacement channel map and segmentation auxiliary information, and obtaining the dress-up image is carried out in detail. describe. Please refer to any of the foregoing embodiments for details that are not exhaustive in this embodiment.
  • a virtual dress-up method provided by an embodiment of the present disclosure includes the following steps:
  • the instance segmentation algorithm may refer to an algorithm for pixel-level segmentation of certain types of object instances in an image.
  • the background segmentation map of the target image can be considered as the remaining image obtained by removing the person instance in the target image through the instance segmentation algorithm.
  • the background completion network may refer to a neural network that performs background completion on blank areas removed from the background segmentation map.
  • the principle of background completion mainly refers to filling the blank area according to the texture or features around the blank area in the background segmentation map.
  • the corresponding regional background is generated through the background completion network, and then the generated regional background is filled into the blank position in the background segmentation map of the target image , to complete the blank area to get a complete background.
  • the background segmentation image of the target image after background completion is the first background image.
  • FIG. 8 is a schematic diagram of background completion based on a background completion network provided by an embodiment of the present disclosure.
  • G B represents the background completion network, represents the background segmentation map of the target image, Indicates the first background image after background completion.
  • the dressing network may refer to a neural network capable of changing clothes between different character instances.
  • the clothing of the second character instance in the target image can be adjusted according to the second character instance
  • the posture of the target produces a relative displacement change, so that in the process of transforming the clothing of the second person instance in the target image into the target clothing, the target clothing after the displacement change is fused to the posture of the second person instance, so as to obtain the first clothing after the change.
  • the clothing of the second character instance in the target image is transformed into the target clothing in the source image
  • the transformation The installed second character instance includes: determining the attitude displacement channel map according to the first attitude information and the second attitude information; superimposing the attitude displacement channel picture and the pixel displacement channel picture to obtain a combined displacement channel picture; dividing the auxiliary information and combining
  • the displacement channel map is input to the dressing network to transform the clothing of the second character instance in the target image into the target clothing in the source image through the dressing network to obtain the second character instance after dressing.
  • the superposition may refer to fusing the feature information of the attitude displacement channel map with the feature information of the pixel displacement channel map.
  • the superposition may refer to fusing the feature information of the attitude displacement channel map with the feature information of the pixel displacement channel map.
  • the segmentation auxiliary information obtained through the Mask auxiliary network and the combined displacement channel map obtained by superposition into the dressing network.
  • the feature information of the protected area of the second character instance can also be input into the dressing network In order to ensure that the characteristics of the protected area of the second character instance after the change are kept unchanged; then the input information is processed through the change network; finally, the clothing of the second character instance in the target image is transformed into the source The target clothing in the image, get the second character instance after changing the clothing.
  • the second character instance after dressing is fused to the corresponding position in the first background image to obtain the dressing image.
  • FIG. 9 is a schematic diagram of an implementation of generating a dress-up image based on a dress-up network provided by an embodiment of the present disclosure.
  • I s represents the source image
  • I t represents the target image
  • D s represents the three-dimensional human body posture information of the first person instance in the source image
  • D t represents the three-dimensional human body posture information of the second person instance in the target image
  • F v represents the attitude displacement channel map determined according to D s and D t
  • F p represents the pixel displacement channel map.
  • a virtual dressing method provided by an embodiment of the present disclosure embodies the process of changing clothes through a dressing network.
  • the background of the target image is completed through the background completion network, and the second character instance after the change is fused into the background image of the target image after the completion.
  • the maintenance of the background characteristics of the change can further improve the performance of virtual change.
  • the degree of sophistication improves the user experience of virtual dress-up.
  • FIG. 10 is a schematic flow diagram of a training and dressing network provided by an embodiment of the present disclosure. This embodiment is refined based on multiple example solutions in the foregoing embodiments. In this embodiment, the process of training and testing the dressing network is described in detail. Please refer to any of the foregoing embodiments for details that are not exhaustive in this embodiment.
  • a virtual dress-up method provided by an embodiment of the present disclosure includes the following steps:
  • the sample target image may refer to a sample image including clothing to be changed and corresponding character instances. There may be multiple sample target images, for example, multiple sample images with different person instance poses.
  • the background segmentation map of the sample target image is obtained through an instance segmentation algorithm.
  • background complementation is performed on the background segmentation map of the sample target image obtained above through a background complementation network to obtain a corresponding second background map.
  • the sample source image can be regarded as a sample image for providing target clothing and containing a character instance.
  • the clothing in the sample target image is transformed into the target clothing in the sample source image through the dressing network, and the person instance after dressing is obtained.
  • the result of the costume change may refer to the costume change image obtained by fusing the character instance image after the costume change with the second background image.
  • the sample dress-up image can be regarded as a standard image in which the target clothing in the sample source image is transformed into the character instance in the sample target image.
  • the sample source image, sample target image, and sample dress-up image are all known, and can be downloaded from the sample database, or can be obtained from real collected images, for example, obtained by taking pictures of scenes where different characters are actually dressed.
  • the result of the dress-up can be regarded as a predicted value after the dress-up described by the dress-up network
  • the sample dress-up image can be regarded as a real value after the dress-up.
  • the loss function can be used to characterize the degree of difference between the dressing result output by the dressing network (ie, the predicted value) and the sample dressing image (ie, the real value). For example, the larger the loss function, the greater the gap between the dressing result and the sample dressing image; the smaller the loss function, the smaller the gap between the dressing result and the sample dressing image, and the closer to The true value indicates that the robustness of the facelift network is better.
  • the first condition can be the set loss threshold range, for example, for the loss function within the set threshold range, the corresponding changeover result has a small gap with the sample changeover image, which meets the training requirements of the changeover network .
  • the dress-up network is trained based on sample source images, sample target images, and sample dress-up images. During each training iteration, if the loss function between the dress-up result and the sample dress-up image satisfies the first condition, it indicates that the training of the facelift network is over; otherwise, if the first condition is not met, return to S410, and continue the iterative training of the facelift network until the loss function satisfies the first condition.
  • any two images are input at the input end of the dressing network, where the two images can be the sample source image and the sample target image in the original sample, or the obtained Any two images.
  • S460 includes: transforming the clothing in the sample source image to the person instance in the sample target image to obtain an intermediate image, and transforming the clothing in the intermediate image to the person instance in the sample source image to obtain a result image; according to the result The error of the image from the sample source image determines the test result.
  • the error can be a value representing the degree of difference between the feature information of the result image and the source image. If the error is large, it indicates that the test result of the facelift network is not good, and the facelift network needs further training; if the error is small, or less than A certain threshold value indicates that the test effect of the repackaged network is good, and the training and testing of the repackaged network is over.
  • the input image for testing can be a sample source image and a sample target image, transform the clothing of the character instance A in the sample source image to the character instance B in the sample target image to obtain an intermediate image, and the character instance in the intermediate image is recorded as is B', and then transform the clothing of the character instance B' in the intermediate image to the character instance A in the sample source image to obtain the result image, and the character instance in the result image is recorded as A'.
  • the test result can be determined by judging the error between the result image and the sample source image (mainly the error between character instances A and A').
  • the facelift network can also share the parameters of the reconstruction network.
  • the reconstruction network can be considered as a neural network that reconstructs the background complement image with the character instance removed based on the portrait information and pose information.
  • the reconstruction network is used to reconstruct the source image, and the feature transformation rules learned in the reconstruction network are provided to the dressing network in the form of network parameters, assisting the training and learning of the dressing network, and improving the training efficiency of the dressing network. On the basis of , it can also improve the performance of the facelift network.
  • the reconstruction network is used in the training process of the facelift network, and in each iteration process, the following operations are performed until the loss function between the reconstructed image and the sample source image satisfies the second condition, the network of the reconstruction network
  • the parameters are shared with the facelifting network: the background segmentation map of the sample source image is obtained through the instance segmentation algorithm; the background segmentation map of the sample source image is completed through the background completion network to obtain the third background map; through the reconstruction network, according to the sample Image reconstruction is performed on the portrait information of the source image, the intersection of the sample source image and the attitude information of the sample source image, and the third background image to obtain a reconstructed image.
  • the second condition can be a set loss threshold range, for example, the loss function within the set threshold range, the corresponding reconstructed image and the sample source image have a small gap, indicating that the reconstruction network The rebuild works fine.
  • the loss function between the reconstructed image and the sample source image satisfies the second condition, it means that the reconstruction effect of the reconstruction network is good, and the corresponding network parameters are reliable.
  • the network parameters of the reconstruction network can be shared Give the network a facelift.
  • Fig. 11 is a schematic diagram of an implementation of reconstructing a sample source image based on a reconstruction network provided by an embodiment of the present disclosure.
  • G s represents the reconstructed network
  • Indicates the third background image corresponding to the sample source image Represents the portrait segmentation information in the portrait information of the sample source image
  • Indicates the image obtained by superimposing the sample source image and the 3D human body pose information of the sample source image (the purpose of the superimposition is to use the 3D human body pose information of the sample source image to intersect the sample source image, so that the scope is narrowed and the clothing features are missing;
  • O s represents the output result, that is, according to and Perform image reconstruction to obtain a reconstructed image.
  • I s represents the sample source image
  • represents the loss function between the reconstructed image and the sample source image.
  • Fig. 12 is a schematic diagram of implementing a training and dressing network provided by an embodiment of the present disclosure.
  • the rebuilt network shares the network parameters that have completed its own training and meet the second condition with the rebuilt network.
  • the rebuilt network uses the network parameters of the rebuilt network to continue Do iterative training.
  • ⁇ 1 represents the loss function between the reconstructed image and the sample source image
  • ⁇ 2 represents the loss function between the transformation result of the transformation network and the sample transformation image.
  • a virtual dressing method provided by an embodiment of the present disclosure embodies the process of training and testing a dressing network.
  • this method in the training process of the refitting network, by sharing the reliable network parameters completed by the rebuilt network training to the refitting network, it can assist the training and learning of the refitting network, and on the basis of improving the training efficiency of the refitting network It can also effectively improve the performance of the facelift network.
  • the performance effect of the facelift network can be further guaranteed, thereby improving the accuracy of virtual facelift.
  • Fig. 13 is a schematic flow chart of a virtual dressing method provided by an embodiment of the present disclosure. This method is applicable to the situation where a character instance in any input image is changed, and the method can be executed by a virtual dressing device. , wherein the device can be implemented by software and/or hardware, and is generally integrated on electronic equipment.
  • electronic equipment includes but is not limited to: desktop computers, notebook computers, servers, tablet computers, or smart phones.
  • the virtual dressing method of this embodiment can quickly change any input image for a specified clothing without requiring the user to provide a source image. Therefore, it is suitable for simplified mobile application devices, such as mobile phones or tablet computers. Please refer to any of the foregoing embodiments for details that are not exhaustive in this embodiment.
  • a virtual dress-up method provided by an embodiment of the present disclosure includes the following steps:
  • the image to be changed can be considered as an image containing an instance of a character waiting for a virtual change.
  • the image read from the album, which contains instances of characters to be dressed.
  • the preset network is trained based on the sample image pair, the sample image pair includes the sample target image and the corresponding sample dressing image, and the sample dressing image is obtained according to the pose information of the sample target image.
  • the preset network can be regarded as a pre-trained network capable of virtual clothing transformation.
  • the specified clothing can be considered as the target clothing.
  • the sample target image may refer to a sample image containing a character instance to be dressed up, and the sample dress-up image may refer to an image obtained by transforming the clothing of the character instance to be changed into specified clothing according to the posture information of the sample target image.
  • the sample image pair may refer to a plurality of sample pairs composed of a large number of sample target images and corresponding sample dressing images.
  • the preset network has the ability to change the specified clothing for the character instance in any input image, and it does not require the user to provide the source image.
  • the training process of the default network only needs to use the sample target image and the sample facelift image, and the sample target image is used as input, and the sample facelift image is used as output.
  • the sample target image and the corresponding sample dress-up image respectively correspond to the state of the character instance to be changed before and after the dress-up.
  • the sample dress-up image is obtained according to the posture information of the sample target image.
  • the poses of the person instances in the target image are different, and the displacement changes required for the specified clothing during the dressing process are different.
  • the sample target image and the sample dressing image can come from the training samples of the dressing network in the above embodiment; or, the sample target image can be any image containing a character instance, and its corresponding sample dressing image It can be generated through the dress-up network in the above-mentioned embodiments.
  • the process of generating a corresponding dress-up image according to the source image that provides the target clothing can refer to any of the above-mentioned embodiments. This will not be described in detail here.
  • the specified clothing can be a default fixed clothing.
  • the preset network can be specially used to transform the clothing of the character instance in the image to be changed into the default fixed clothing after training.
  • a dedicated preset network can be trained separately. When the clothing transformation is carried out through the preset network, less computing resources are occupied, and the obtained clothing transformation results are more professional and accurate.
  • the preset network can also be used for multiple clothing changes, and the specified clothing can also be arbitrarily specified by the user (such as the user to be changed) from the clothing library.
  • the preset network needs to learn different clothing transformations According to the feature transformation rules on the character instance to be changed, the performance requirements of the preset network are improved when performing clothing transformation, so that clothing transformation can be flexibly implemented for different specified clothing.
  • the preset network may be a Generative Adversarial Networks (GAN), such as a GAN-based image-to-image translation (pix2pix) model algorithm network.
  • GAN Generative Adversarial Networks
  • pix2pix image-to-image translation
  • the preset network is a trained separate network that can directly transform clothing. For example, for a specified clothing, an input is given to the preset network, such as an image to be changed, and it can be directly changed in the preset network. Assume that the output of the network gets the corresponding dress-up image.
  • the method of obtaining the sample dress-up image according to the posture information of the sample target image can be determined according to the virtual dress-up method in any one of the above-mentioned embodiments.
  • the virtual dress-up method in any of the above-mentioned embodiments can be used to generate a corresponding sample dress-up image according to the sample target image.
  • multiple sample images composed of the sample target image and the corresponding sample dress-up images are paired
  • the training data of the preset network to train the preset network, so that the preset network can learn the law of feature transformation in the process of clothing transformation through training, and have the corresponding ability to change clothes, so that it can be based on an input, such as the image to be changed , to get the corresponding dressing results, so as to realize the practical application of virtual dressing.
  • the preset network includes a generator and a discriminator;
  • the training process of the preset network includes: for the specified clothing, the generator is used to generate a composite image based on the sample target image; the discriminator is used to judge the authenticity of the composite image based on the sample dressing image; The above-mentioned operations of generating a composite image and judging the authenticity of the composite image are repeated until the judgment result satisfies the training stop condition.
  • the generator can refer to the module used in the preset network to generate a specific image by learning the feature distribution of the real image; for example, for the specified clothing, the generator can learn to generate the specified clothing according to the feature distribution information of the sample target image.
  • the discriminator can refer to the module in the preset network that is used to judge the authenticity of the image generated by the generator based on the real image; for example, the authenticity of the synthesized image can be judged by the discriminator based on the sample dress-up image.
  • the training stop condition may refer to the condition for stopping the training determined according to the discriminator's discrimination result of the authenticity of the synthesized image.
  • the training stop condition can be that the discrimination results of all synthetic images are true, or it can also be that the proportion of the synthetic images whose discrimination results are true meets a set threshold, such as 90% of the synthetic images.
  • the discrimination results are true the setting of the threshold is not limited here.
  • the generator and the discriminator repeatedly perform the operation of generating a composite image and judging the authenticity of the composite image until the judgment result meets the training stop condition, and the training of the preset network is completed.
  • a virtual dressing method provided by an embodiment of the present disclosure, the method first obtains an image to be changed, and then transforms the clothing of a character instance in the image to be changed into a specified clothing through a preset network; wherein, the preset The network is trained based on a sample image pair, the sample image pair includes a sample target image and a corresponding sample dressing image, and the sample dressing image is obtained according to the pose information of the sample target image.
  • the preset The network is trained based on a sample image pair, the sample image pair includes a sample target image and a corresponding sample dressing image, and the sample dressing image is obtained according to the pose information of the sample target image.
  • training data provides a reliable basis for the training of the preset network, so that the preset network after training can guarantee the image quality after the facelift, and in On this basis, fast and accurate virtual dressing is realized.
  • Fig. 14 is a schematic structural diagram of a virtual dressing device provided by an embodiment of the present disclosure, wherein the device can be implemented by software and/or hardware, and is generally integrated on an electronic device.
  • the device includes: a first acquisition module 610, a processing module 620 and a first replacement module 630;
  • the first obtaining module 610 is configured to obtain a source image and a target image, wherein the source image includes the target clothing associated with the first character instance, and the target image includes the second character instance;
  • a processing module 620 configured to process the source image and the target image to obtain first portrait information and first pose information of the source image, and second pose information of the target image, respectively;
  • the first dressing module 630 is configured to transform the clothing of the second character instance in the target image into the The target clothing in the source image, get the dress-up image.
  • the device first acquires a source image and a target image through the first acquisition module 610, the source image includes the target clothing associated with the first character instance, and the target image includes the second character instance; and then through the processing module 620
  • the source image and the target image are processed to obtain the first portrait information and the first posture information of the source image respectively, and the second posture information of the target image;
  • the first pose information and the second pose information transform the clothing of the second character instance in the target image into the target clothing in the source image, and obtain a dress-up image.
  • the transformation of the target clothing between different characters can be realized. Dress-up, thus improving the flexibility and accuracy of virtual dress-up.
  • the first repacking module 630 includes:
  • a segmentation auxiliary information determination unit configured to input the first portrait information, the first posture information and the second posture information into a segmentation auxiliary network to obtain the posture of the target clothing in the second character instance Segmentation auxiliary information under;
  • the pixel displacement channel map determination unit is configured to input the first portrait information, the first posture information, the second posture information and the segmentation auxiliary information into the clothing displacement auxiliary network to obtain the target clothing by a pixel-shifted channel map of the first character instance transformed to the second character instance;
  • the dress-up image determination unit is configured to, according to the posture displacement channel map between the first posture information and the second posture information, the pixel displacement channel diagram and the auxiliary segmentation information, convert all the information in the target image to The clothing of the second character instance is transformed into the target clothing in the source image to obtain a dressing image.
  • the segmentation auxiliary network is a neural network including a double-branch input
  • the input of the first branch includes the first portrait information and the first posture information, wherein the first portrait information includes first portrait segmentation information and first portrait analysis information, and the first The posture information includes the three-dimensional human body posture information and key point information of the human body of the first character instance;
  • the input of the second branch includes the second posture information, and the second posture information includes the three-dimensional human body posture information and the key point information of the human body of the second character instance.
  • the clothing displacement auxiliary network is a neural network including a double-branch input
  • the input of the first branch includes the first portrait information and the first posture information, wherein the first portrait information includes first portrait segmentation information and first portrait analysis information, and the first The posture information includes the three-dimensional human body posture information and key point information of the human body of the first character instance;
  • the input of the second branch includes the segmentation auxiliary information and the second pose information, wherein the segmentation auxiliary information includes the second portrait segmentation information and the second portrait segmentation information of the target clothing in the pose of the second character instance
  • the segmentation auxiliary information includes the second portrait segmentation information and the second portrait segmentation information of the target clothing in the pose of the second character instance
  • Two-person analysis information the second pose information includes three-dimensional human body pose information and human body key point information of the second character instance.
  • the changing image determination unit includes:
  • the segmentation map determination subunit is configured to obtain the background segmentation map of the target image through an instance segmentation algorithm
  • the first background image determination subunit is configured to perform background completion on the background segmentation image of the target image through the background completion network to obtain the first background image;
  • the second character instance determination subunit is configured to, according to the posture displacement channel map between the first posture information and the second posture information, the pixel displacement channel diagram, and the segmentation auxiliary information, through the dressing network
  • the clothing of the second character instance in the target image is transformed into the target clothing in the source image to obtain the second character instance after dressing;
  • the outfit change image determination subunit is configured to fuse the second character instance image after the outfit change into the first background image to obtain a costume change image.
  • the second character instance determining subunit includes:
  • the device further includes: determining the protected area of the second person instance according to the second person portrait information, and transforming the clothing of the second person instance in the target image into the target image in the source image. During clothing, the features within the protected area remain unchanged.
  • the dress-up network is trained based on sample source images, sample target images and sample dress-up images
  • the training process of the dress-up network includes: in each iteration process, perform the following operations until the loss function between the dress-up result and the sample dress-up image satisfies the first condition:
  • the image of the character instance after the dressing is fused with the second background image to obtain a dressing result.
  • the network parameters of the remodeling network are shared by the remodeling network
  • the reconstruction network is set to perform the following operations in each iteration during the training process of the dressing network until the loss function between the reconstructed image and the source image of the sample satisfies the second condition , share the network parameters of the reconstructed network with the facelift network:
  • the reconstruction network Through the reconstruction network, perform image reconstruction according to the portrait information of the sample source image, the intersection of the sample source image and the pose information of the sample source image, and the third background image to obtain a reconstructed image.
  • the device also includes:
  • the testing module is configured to test the dressing network based on the sample source image and the sample target image.
  • the testing of the replacement network includes:
  • a test result is determined according to an error between the result image and the sample source image.
  • the above-mentioned virtual dressing device can execute the virtual dressing method provided in Embodiments 1 to 4 of the present disclosure, and has corresponding functional modules and beneficial effects for executing the method.
  • Fig. 15 is a schematic structural diagram of a virtual dressing device provided by an embodiment of the present disclosure, wherein the device can be implemented by software and/or hardware, and is generally integrated on an electronic device.
  • the device includes: a second determination module 710 and a second replacement module 720;
  • the second determination module 710 is configured to obtain images to be changed
  • the second dressing module 720 is configured to transform the clothing of the character instance in the image to be changed into a specified clothing through a preset network; wherein, the preset network is obtained based on a pair of sample images trained, and the pair of sample images includes A sample target image and a corresponding sample dress-up image, where the sample dress-up image is obtained according to pose information of the sample target image.
  • the device first obtains the image to be changed through the second determining module 710, and then the second changing module 720 transforms the clothing of the character instance in the image to be changed into a specified clothing through a preset network; , the preset network is trained based on a sample image pair, the sample image pair includes a sample target image and a corresponding sample dressing image, and the sample dressing image is obtained according to pose information of the sample target image.
  • the device through an independent preset network, the virtual transformation of clothing can be realized according to one input, which provides convenience for users in practical applications.
  • training data provides a reliable basis for the training of the preset network, so that the preset network after training can guarantee the image quality after the facelift, and in On this basis, fast and accurate virtual dressing is realized.
  • the preset network includes a generator and a discriminator
  • the training process of the preset network includes: for the specified clothing, generating a synthetic image according to the sample target image by the generator;
  • the method for obtaining the sample dress-up image according to the posture information of the sample target image is obtained according to the virtual dress-up method described in any one of the above-mentioned embodiments.
  • the above-mentioned virtual dressing device can execute the virtual dressing method provided by the embodiments of the present disclosure, and has corresponding functional modules and beneficial effects for executing the method.
  • FIG. 16 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • FIG. 16 shows a schematic structural diagram of an electronic device 400 suitable for implementing the embodiments of the present disclosure.
  • the electronic device 400 shown in FIG. 16 is only an example, and should not limit the functions and scope of use of the embodiments of the present disclosure.
  • an electronic device 400 may include one or more processors (such as a central processing unit, a graphics processing unit, etc.) Alternatively, a program loaded from the storage device 408 into the random access memory (Random Access Memory, RAM) 403 executes various appropriate actions and processes.
  • processors 401 implement the virtual dress-up method as provided in the present disclosure.
  • RAM 403 various programs and data necessary for the operation of the electronic device 400 are also stored.
  • the processor 401, ROM 402, and RAM 403 are connected to each other through a bus 404.
  • An input/output (Input/Output, I/O) interface 405 is also connected to the bus 404 .
  • an input device 406 including, for example, a touch screen, a touchpad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; including, for example, a liquid crystal display (Liquid Crystal Display, LCD) , an output device 407 such as a speaker, a vibrator, etc.; a storage device 408 including, for example, a magnetic tape, a hard disk, etc., configured to store one or more programs; and a communication device 409.
  • the communication means 409 may allow the electronic device 400 to perform wireless or wired communication with other devices to exchange data. While FIG. 16 shows electronic device 400 having various means, it should be understood that implementing or possessing all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, where the computer program includes program codes for executing the methods shown in the flowcharts.
  • the computer program may be downloaded and installed from a network via communication means 409 , or from storage means 408 , or from ROM 402 .
  • the computer program is executed by the processor 401, the above functions defined in the methods of the embodiments of the present disclosure are executed.
  • the computer readable storage medium may be a non-transitory computer readable storage medium.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the two.
  • a computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof.
  • Computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the client and the server can communicate using any currently known or future-developed network protocols such as Hyper Text Transfer Protocol (Hyper Text Transfer Protocol, HTTP), and can communicate with any form or medium of digital Data communication (eg, communication network) interconnections.
  • HTTP Hyper Text Transfer Protocol
  • Examples of communication networks include local area networks (LANs), wide area networks (WANs), internetworks (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
  • the above-mentioned computer-readable medium may be contained in the above-mentioned electronic device 400 ; or it may exist independently without being assembled into the electronic device 400 .
  • the above-mentioned computer-readable medium stores one or more computer programs, and when the above-mentioned one or more programs are executed by the processor, the following method is implemented: the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs When executed by the electronic device, the electronic device 400: can write computer program codes for performing the operations of the present disclosure in one or more programming languages or a combination thereof, the above-mentioned programming languages including object-oriented programming Languages, such as Java, Smalltalk, C++, also include conventional procedural programming languages, such as the "C" language or similar programming languages.
  • object-oriented programming Languages such as Java, Smalltalk, C++
  • conventional procedural programming languages such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (such as through an Internet connection using an Internet service provider). ).
  • each block in a flowchart or block diagram may represent a module, program segment, or a portion of code that includes one or more executable instructions for implementing specified logical functions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • modules involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of the module does not constitute a limitation on the module itself under certain circumstances.
  • exemplary types of hardware logic components include: Field Programmable Gate Arrays (Field Programmable Gate Arrays, FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (Application Specific Standard Parts, ASSP), System on Chip (System on Chip, SOC), Complex Programmable Logic Device (Complex Programming logic device, CPLD) and so on.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • Example 1 provides a virtual dressing method, including:
  • the first posture information and the second posture information transform the clothing of the second character instance in the target image into the target clothing in the source image, and obtain Install images.
  • example 2 is according to the method described in example 1,
  • the first portrait information, the first posture information and the second posture information transform the clothing of the second character instance in the target image into the target clothing in the source image, and obtain Install images, including:
  • example 3 is according to the method described in example 2,
  • the segmentation auxiliary network is a neural network including a double-branch input
  • the input of the first branch includes the first portrait information and the first posture information, wherein the first portrait information includes first portrait segmentation information and first portrait analysis information, and the first The posture information includes the three-dimensional human body posture information and key point information of the human body of the first character instance;
  • the input of the second branch includes the second posture information, and the second posture information includes the three-dimensional human body posture information and the key point information of the human body of the second character instance.
  • example 4 is according to the method described in example 2,
  • the clothing displacement auxiliary network is a neural network including a double-branch input
  • the input of the first branch includes the first portrait information and the first posture information, wherein the first portrait information includes first portrait segmentation information and first portrait analysis information, and the first The posture information includes the three-dimensional human body posture information and key point information of the human body of the first character instance;
  • the input of the second branch includes the segmentation auxiliary information and the second pose information, wherein the segmentation auxiliary information includes the second portrait segmentation information and the second portrait segmentation information of the target clothing in the pose of the second character instance
  • the segmentation auxiliary information includes the second portrait segmentation information and the second portrait segmentation information of the target clothing in the pose of the second character instance
  • Two-person analysis information the second pose information includes three-dimensional human body pose information and human body key point information of the second character instance.
  • Example 5 is according to the method described in Example 2,
  • the second person in the target image is transformed through a dressing network
  • the clothing of the example is transformed into the target clothing in the source image to obtain the second character instance after the dressing;
  • the second character instance image after the dressing is fused with the first background image to obtain a dressing image.
  • example 6 is according to the method described in example 5,
  • Example 7 is according to the method described in Example 1, further comprising:
  • Example 8 is according to the method described in Example 5,
  • the dress-up network is obtained based on sample source images, sample target images and sample dress-up images
  • the training process of the dress-up network includes: in each iteration, perform the following operations until the loss function between the dress-up result and the sample dress-up image satisfies the first condition:
  • the image of the character instance after the dressing is fused with the second background image to obtain a dressing result.
  • Example 9 is according to the method described in Example 8,
  • the replacement network shares network parameters of the reconstruction network
  • the reconstruction network is used to perform the following operations in each iteration process during the training process of the dressing network, until the loss function between the reconstructed image and the sample source image satisfies the second condition , share the network parameters of the reconstructed network with the facelift network:
  • the reconstruction network Through the reconstruction network, perform image reconstruction according to the portrait information of the sample source image, the intersection of the sample source image and the pose information of the sample source image, and the third background image to obtain a reconstructed image.
  • Example 10 according to the method described in Example 5, further includes: testing the dress-up network based on the sample source image and the sample target image;
  • the described testing of the replacement network includes:
  • a test result is determined according to an error between the result image and the sample source image.
  • Example 11 provides a virtual dress-up method, including:
  • the preset network is trained based on a sample image pair, the sample image pair includes a sample target image and a corresponding sample dressing image, and the sample dressing image is obtained according to the posture information of the sample target image.
  • Example 12 is according to the method described in Example 11,
  • the preset network includes a generator and a discriminator
  • the training process of the preset network includes: for the specified clothing, generating a synthetic image according to the sample target image by the generator;
  • Example 13 is based on the method described in Example 12, the method of obtaining the sample dress-up image according to the pose information of the sample target image, according to any one of Examples 1-10 The virtual dress-up method is obtained.
  • Example 14 provides a virtual dressing device, including:
  • the first acquisition module is configured to acquire a source image and a target image, the source image includes the target clothing associated with the first character instance, and the target image includes the second character instance;
  • a processing module configured to process the source image and the target image to respectively obtain first portrait information and first pose information of the source image, and second pose information of the target image;
  • the first dressing module is configured to transform the clothing of the second character instance in the target image into the source according to the first portrait information, the first posture information and the second posture information The target clothing in the image, get the dress-up image.
  • Example 15 provides a virtual dressing device, including:
  • the second determination module is configured to obtain the image to be changed
  • the second dressing module is configured to transform the clothing of the character instance in the image to be changed into a specified clothing through a preset network; wherein, the preset network is trained based on a pair of sample images, and the pair of sample images includes a sample A target image and a corresponding sample dress-up image, the sample dress-up image is obtained according to pose information of the sample target image.
  • Example 16 provides an electronic device, comprising:
  • processors one or more processors
  • a storage device configured to store one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the method described in any one of Examples 1-13.
  • Example 17 provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, the method described in any one of Examples 1-13 is implemented.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Architecture (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • Processing Or Creating Images (AREA)

Abstract

本公开公开了一种虚拟换装方法、装置、电子设备及可读介质,所述方法包括:获取源图像和目标图像,源图像中包括关联于第一人物实例的目标服装,目标图像中包括第二人物实例;对源图像和目标图像进行处理,分别得到源图像的第一人像信息和第一姿态信息,以及目标图像的第二姿态信息;根据第一人像信息、第一姿态信息和第二姿态信息,将目标图像中第二人物实例的服装变换为源图像中的目标服装,得到换装图像。

Description

虚拟换装方法、装置、电子设备及可读介质
本申请要求在2021年12月15日提交中国专利局、申请号为202111539373.2的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本公开实施例涉及图像处理技术领域,例如涉及一种虚拟换装方法、装置、电子设备及可读介质。
背景技术
现在,越来越多的人通过线上的方式挑选服装,但是由于不能实际试穿,人们往往不知道这个服装是否真的适合自己;或者,一些人在看到其他人穿着的服装时,也想知道自己穿上这种服装的效果;此外,一些场景下需要对照片进行换装处理,使得照片更美观、更有趣味性,或者使得照片可用于特定用途,例如,给日常生活中拍摄的照片中的人物换上正装,将换装后的照片放进个人资料中等。因此,虚拟换装具有广泛的应用。
然而,相关技术中的虚拟换装方法只是简单的服装替换,即,将指定图像中的服装区域分割出来并迁移到另一图像中的人物身上,在此过程中服装区域可能被简单的缩放或旋转等。但是,不同人物的体型或姿态复杂多样,将同一件服装迁移到不同体型或姿态的人身上,会存在较大的误差,导致服装区域和人物姿态不匹配、和人物身体部位的衔接部分的处理较为粗糙等,具有很强的局限性。
发明内容
本公开提供了一种虚拟换装方法、装置、电子设备及可读介质,以提高虚拟换装的灵活性和准确性。
第一方面,本公开实施例提供了一种虚拟换装方法,包括:
获取源图像和目标图像,所述源图像中包括关联于第一人物实例的目标服装,所述目标图像中包括第二人物实例;
对所述源图像和所述目标图像进行处理,分别得到所述源图像的第一人像信息和第一姿态信息,以及所述目标图像的第二姿态信息;
根据所述第一人像信息、所述第一姿态信息和所述第二姿态信息,将所述目标图像中所述第二人物实例的服装变换为所述源图像中的目标服装,得到换装图像。
第二方面,本公开实施例还提供了一种虚拟换装方法,包括:
获取待换装图像;
通过预设网络将所述待换装图像中人物实例的服装变换为指定服装;
其中,所述预设网络基于样本图像对训练得到,所述样本图像对包括样本目标图像和对应的样本换装图像,所述样本换装图像根据所述样本目标图像的姿态信息得到。
第三方面,本公开实施例还提供了一种虚拟换装装置,包括:
第一获取模块,设置为获取源图像和目标图像,所述源图像中包括关联于第一人物实例的目标服装,所述目标图像中包括第二人物实例;
处理模块,设置为对所述源图像和所述目标图像进行处理,分别得到所述源图像的第一人像信息和第一姿态信息,以及所述目标图像的第二姿态信息;
第一换装模块,设置为根据所述第一人像信息、所述第一姿态信息和所述第二姿态信息,将所述目标图像中所述第二人物实例的服装变换为所述源图像中的目标服装,得到换装图像。
第四方面,本公开实施例还提供了一种虚拟换装装置,包括:
第二确定模块,设置为获取目标图像并确定目标服装;
第二换装模块,设置为通过预设网络将所述目标图像中的人物实例的服装变换为所述目标服装,得到换装图像。
第五方面,本公开实施例还提供了一种电子设备,包括:
一个或多个处理器;
存储装置,设置为存储一个或多个程序;
所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现本公开实施例提供的虚拟换装方法。
第六方面,本公开实施例还提供了一种计算机可读介质,其上存储有计算机程序,该程序被处理器执行时实现本公开实施例提供的虚拟换装方法。
附图说明
贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。
图1为本公开一实施例提供的一种虚拟换装方法的流程示意图;
图2为本公开另一实施例提供的一种虚拟换装方法的流程示意图;
图3为本公开一实施例提供的一种分割辅助网络的输入分支的示意图;
图4为本公开一实施例提供的一种基于分割辅助网络生成分割辅助信息的实现示意图;
图5为本公开一实施例提供的一种服装位移辅助网络的输入分支的示意图;
图6为本公开一实施例提供的一种基于服装位移辅助网络生成像素位移通道图的实现示意图;
图7为本公开另一实施例提供的一种虚拟换装方法的流程示意图;
图8为本公开一实施例提供的一种基于背景补全网络实现背景补全的示意图;
图9为本公开一实施例提供的一种基于换装网络生成换装图像的实现示意图;
图10为本公开一实施例提供的一种训练换装网络的流程示意图;
图11为本公开一实施例提供的一种基于重建网络重建样本源图像的实现示意图;
图12为本公开一实施例提供的一种训练换装网络的实现示意图;
图13为本公开另一实施例提供的一种虚拟换装方法的流程示意图;
图14为本公开一实施例提供的一种虚拟换装装置的结构示意图;
图15为本公开另一实施例提供的一种虚拟换装装置的结构示意图;
图16为本公开一实施例提供的一种电子设备的结构示意图。
具体实施方式
应当理解,本公开的方法实施方式中记载的多个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
下述多个实施例中,每个实施例中同时提供了示例特征和示例,实施例中记载的多个特征可进行组合,形成多个示例方案,不应将每个编号的实施例仅视为一个技术方案。此外,在不冲突的情况下,本公开中的实施例及实施例中的特征可以相互组合。
图1为本公开一实施例提供的一种虚拟换装方法的流程示意图。该方法可适用于根据不同图像的人像信息和姿态信息对目标服装进行变换,以实现不同人物之间的虚拟换装的情况。该方法可以由虚拟换装装置来执行,其中该装置可由软件和/或硬件实现,并一般集成在具备图像处理能力的电子设备上,在本实施例中电子设备包括但不限于:台式计算机、笔记本电脑、服务器、平板电脑或智能手机等设备。
如图1所示,本公开实施例提供的一种虚拟换装方法,包括如下步骤:
S110、获取源图像和目标图像,所述源图像中包括关联于第一人物实例的目标服装,所述目标图像中包括第二人物实例。
在本实施例中,源图像可以指提供目标服装且包含第一人物实例的图像,例如穿有目标服装的模特的图像,该模特即为第一人物实例。可以通过预设数据库获取源图像,其中预设数据库可以指预先存储有大量穿着相同或不同服装的模特图像的数据库。目标图像可以指包含待换装的第二人物实例的图像,例如想要进行换装的用户可上传一张自己的照片,将用户上传的照片作为所获取的目标图像。其中,人物实例可以认为是指图像中所包含的人物对象。目标服装可以认为是源图像所提供的、用于变换至目标图像中的人物实例上的服装。源图像和目标图像中不仅可以包含对应的人物实例和其对应的服装(如模特和用户的本身和其所穿的服装)的特征,还可以包含对应的人物实例所处的背景的特征。
第一人物实例可以指源图像中所包含的人物对象。第二人物实例可以指目标图像中所包含的人物对象。
S120、对所述源图像和所述目标图像进行处理,分别得到所述源图像的第一人像信息和第一姿态信息,以及所述目标图像的第二姿态信息。
在本实施例中,处理可以指对源图像和目标图像进行相应的图像处理操作,以得到对应的人像信息和姿态信息。例如,可以对提取源图像和目标图像进行实例分割或语义分割、识别源图像和目标图像中人物实例的姿态、提取源图像和目标图像中人物实例的人体关键点,以及对源图像和目标图像中人物实例进行解析,确定不同的身体部位和服装特征等。
其中,人像信息可以指对源图像和目标图像中所包含的人物实例和服装进行特征提取后所得到的图像,主要目的是确定人物实例的轮廓、裸露的身体部位以及被服装遮盖的身体部位等。人像信息可以包括人像分割信息和人像解析信息。人像分割信息可以指根据图像中所包含的人物实例和服装的整体轮廓对其进行分割后所得到的图像,主要目的是确定人物实例的轮廓。需要说明的是,对于源图像中的第一人物实例,人像分割信息主要指第一人物实例在穿着目标服装的情况下的轮廓,即,该轮廓需要考虑目标服装的轮廓确定。也可以理解为,该轮廓为第一人物实例的轮廓与目标服装轮廓的并集。人像解析信息可以指对图像中包含的人物实例和服装进行轮廓分割和部位解析所得到的图像,如对人物实例的四肢、头部和脚部等裸露的身体部位进行解析和区分,对服装的整体、上衣和下衣等部位进行解析和区分等;人像解析信息中的不同部位可以通过不同的颜色或纹理来表征。
姿态信息可以指对源图像和目标图像中所包含的人物实例进行人体关节点定位处理后得到相应的像素点信息,多个像素点信息可构成表征姿态的图像。姿态信息可以包括三维(3-Dimension,3D)姿态信息和二维(2-Dimension,2D)姿态信息。3D姿态信息可以指由上述像素点信息中的3D像素点坐标所构成的表征人物实例的立体姿态的图像,可利用具备相应姿态的人体示意图表征;2D姿态信息可以指由上述像素点信息中的2D像素点坐标信息所构成的表征姿态的图像,可利用符合相应姿态的关键点或关键点的连线表征。
第一人像信息和第一姿态信息可以分别认为是指对源图像进行处理后所得到的人像信息和姿态信息。第二姿态信息可以认为是指对目标图像进行处理后所得到的姿态信息。
需要说明的是,本实施例对源图像和目标图像进行处理的具体方法不作限定,可根据实际需求灵活选择相应的图像处理算法。
S130、根据所述第一人像信息、所述第一姿态信息和所述第二姿态信息,将所述目标图像中所述第二人物实例的服装变换为所述源图像中的目标服装,得到换装图像。
在本实施例中,第二人物实例的服装可以认为是指目标图像中人物实例所穿的服装。需要说明的是,变换并不是指通过抠图的方式将第二人物实例的服装抠掉,替换为关联于第一人物实例的目标服装。变换的原理主要是,根据第一人像信息、第一姿态信息和第二姿态信息分析得到第一人物实例与第二人物实例之间的姿态变化,从而得到第一人物实例所关联的目标服装变换到第二人物实例的姿态上所需的位移量,相应的将源图像中的目标服装按照上述所需的位移量发生对应的位移变化,以在替换目标图像中第二人物实例的服装的同时,能够融合到第二人物实例的姿态上,以得到目标服装在第二人物实例姿态下的图像。在此基础上,通过将目标图像中第二人物实例的服装变换为源图像中的目标服装,可以得到对应的换装图像。
例如,在变换过程中,还可根据目标图像的第二人像信息确定第二人物实例的保护区域,在将目标图像中第二人物实例的服装变换为源图像中的目标服装的过程中,保护区域内的特征保持不变。
其中,第二人像信息可以指对目标图像进行处理后所得到的人像信息。保护区域可以指目标图像中第二人物实例的脸部、手部以及脚部等裸露的身体部位。本实施例中,在将目标图像中第二人物实例的服装变换为源图像中的目标服装的过程中,为保证目标图像中第二人物实例(即用户)的脸部、手部以及脚部等部位不会被影响,可以利用第二人像信息,对第二人物实例进行分割和部位的解析,确定第二人物实例的保护区域,并使得保护区域内的特征保持不变,从而保证用户能够准确地看到被换装的人穿上目标服装后的完整效果。
例如,经过换装后的第二人物实例可以显示在指定背景上,如白色背景;也可以融合在目标图像的原背景中。需要说明的是,若将换装后的第二人物实例融合在目标图像的原背景中,则可以对目标图像进行抠图以及背景补全等操作;例如,首先可以通过抠图的方法将目标图像中的原第二人物实例抠掉,然后通过相应的图像处理方法将抠图后的目标图像中的背景补全,最后可以将换装后的第二人物实例融合到背景补全后的目标图像中,以得到对应的换装图像。在此基础上,通过还原第二人物实例换装后所处的背景和环境,使换装图像更真实、更准确。
本公开实施例提供的一种虚拟换装方法,首先获取源图像和目标图像,源图像中包括关联于第一人物实例的目标服装,目标图像中包括第二人物实例;然后对源图像和目标图像进行处理,分别得到源图像的第一人像信息和第一姿态信息,以及目标图像的第二姿态信息;最后根据第一人像信息、第一姿态信息和第二姿态信息,将目标图像中第二人物实例的服装变换为源图像中的目标服装,得到换装图像。上述方法通过利用源图像和目标图像的人像信息和姿态信息,能够实现目标服装在不同人物之间的变换,在此基础 上,还可根据不同人物实例的姿态,实现对任意姿态的人物的换装,从而提高了虚拟换装的灵活性和准确性。
图2为本公开另一实施例提供的一种虚拟换装方法的流程示意图,本实施例在上述实施例中示例方案为基础进行细化。在本实施例中,对根据第一人像信息、第一姿态信息和第二姿态信息,将目标图像中第二人物实例的服装变换为源图像中的目标服装,以得到换装图像的过程进行了具体描述。本实施例尚未详尽的内容请参考前述实施例。
如图2所示,本公开实施例提供的一种虚拟换装方法,包括如下步骤:
S210、获取源图像和目标图像,源图像中包括关联于第一人物实例的目标服装,目标图像中包括第二人物实例。
S220、对源图像和目标图像进行处理,分别得到源图像的第一人像信息和第一姿态信息,以及目标图像的第二姿态信息。
S230、将第一人像信息、第一姿态信息和第二姿态信息输入至分割辅助网络,得到目标服装在第二人物实例的姿态下的分割辅助信息。
在本实施例中,分割辅助网络可以指一种对图像中人物实例和服装的轮廓特征信息进行分割的神经网络,例如为UNet网络。分割辅助信息可以指分割辅助网络根据第一人像信息、第一姿态信息和第二姿态信息得到的、用于反映目标服装的轮廓在第二人物实例的姿态下的特征的输出,输出的结果可以通过掩码(Mask)表征,因此分割辅助网络也可以理解为Mask辅助网络。分割辅助信息可以包括目标服装在第二人物实例的姿态下的人像分割信息和人像解析信息。也就是说,分割辅助信息可以认为是通过Mask辅助网络,根据第一人像信息,以及根据第一姿态信息和第二姿态信息所分析得到的第一和第二人物实例之间的姿态位移变化,所得到的姿态位移变化可使得目标服装的轮廓按照上述姿态变化位移发生相应的位移后,能够融合于第二人物实例的姿态。需要说明的是,在此之前获得的人像信息和姿态信息均是考虑了人物实例本身以及目标服装的轮廓的,因此在得到姿态位移变化的情况下,目标服装的轮廓也可变化为与第二人物实例的姿态相适应。
例如,分割辅助网络为包括双分支输入的神经网络,第一个分支的输入包括第一人像信息和第一姿态信息,其中,第一人像信息包括第一人像分割信息和第一人像解析信息,第一姿态信息包括第一人物实例的三维人体姿态信息和人体关键点信息;第二个分支的输入包括第二姿态信息,第二姿态信息包括第二人物实例的三维人体姿态信息和人体关键点信息。
其中,双分支输入可以认为是两个输入分支,即分割辅助网络可以为包括两个输入分支的神经网络。第一个分支可以认为是指源图像所对应的输入分支,第二个分支可以认为是指目标图像所对应的输入分支。
第一人像分割信息和第一人像解析信息可以分别指对源图像进行处理后所得到的人像分割信息和人像解析信息。
三维人体姿态信息可以指根据图像中人物实例的人体关节点所对应的3D像素点坐标信息所得到的表征立体姿态的图像。人体关键点信息可以指根据图像中人物实例的人体关节点所对应的2D像素点坐标信息所构成的表征姿态的图像。每个人物实例都可以具备对应的三维人体姿态信息和人体关键点信息。
例如,在通过分割辅助网络得到对应的分割辅助信息的过程中,第一个分支的输入可以包括第一人像信息和第一姿态信息,第二个分支的输入至少包括第二姿态信息,也可以包括第二人像信息和第二姿态信息,此处对此不作限定。
图3为本公开实施例提供的一种分割辅助网络的输入分支的示意图。如图3所示,第一输入分支表示源图像所对应的输入分支,其中s1和s2分别表示第一人像分割信息和第一人像解析信息,s3和s4分别表示第一人物实例的三维人体姿态信息和人体关键点信息;第二输入分支表示目标图像所对应的输入分支,其中d1和d2分别表示第二人物实例的三维人体姿态信息和人体关键点信息。
图4为本公开实施例提供的一种基于分割辅助网络生成分割辅助信息的实现示意图。如图4所示,src表示源图像对应的输入分支,即第一输入分支,dst表示目标图像对应的输入分支,即第二输入分支。编码器部分可以用于对输入分支的信息进行特征提取并进行相应的编码;解码器部分可以用于对经过残差块处理的编码信息进行相应的解码;编码器和解码器部分包括多个不同尺寸的卷积核,如通常可以为1×1,3×3,5×5和7×7等。残差块可以用于对编码器所传输的信息进行相应的图像处理。图中第一输入分支中,
Figure PCTCN2022137802-appb-000001
Figure PCTCN2022137802-appb-000002
分别表示第一人像分割信息和第一人像解析信息;
Figure PCTCN2022137802-appb-000003
Figure PCTCN2022137802-appb-000004
分别表示第一人物实例的三维人体姿态信息和人体关键点信息。第二输入分支中,D t和J t分别表示第二人物实例的三维人体姿态信息和人体关键点信息。在Mask辅助网络的输出结果,即分割辅助信息中,M t表示目标服装在所述第二人物实例的姿态下的人像分割信息,S t表示目标服装在所述第二人物实例的姿态下的人像解析信息。
S240、将第一人像信息、第一姿态信息、第二姿态信息以及分割辅助信息输入至服装位移辅助网络,得到目标服装由第一人物实例变换到第二人物实例的像素位移通道图。
在本实施例中,服装位移(Clothflow)辅助网络可以指一种对服装像素点进行位移变化预测的神经网络,例如为UNet网络。其原理主要是,通过分析根据第一人物实例和第二人物实例之间的姿态位移变化,通过对目标服装进行像素点位移变化可使目标服装融合于目标图像第二人物实例姿态。像素位移通道图可以指通过Clothflow辅助网络,根据第一人像信息、第一姿态信息、第二姿态信息以及分割辅助信息预测得到的输出,用于反映为了使目标服装融合于目标图像中第二人物实例姿态所需的位移量,其形式上可以为2通道图像。2通道图像可以认为是表征目标服装的每个像素点坐标(x,y)位移变化的图像。
例如,服装位移辅助网络为包括双分支输入的神经网络,第一个分支的输入包括第一人像信息和第一姿态信息,其中,第一人像信息包括第一人像分割信息和第一人像解析信息,第一姿态信息包括第一人物实例的三维人体姿态信息和人体关键点信息;第二个分支的输入包括分割辅助信息和第二姿态信息,其中,分割辅助信息包括目标服装在第二人物实例的姿态下的第二人像分割信息和第二人像解析信息,第二姿态信息包括第二人物实例的三维人体姿态信息和人体关键点信息。
其中,第二人像分割信息和第二人像解析信息可以分别指对目标图像进行处理后所得到的人像分割信息和人像解析信息。
图5为本公开实施例提供的一种服装位移辅助网络的输入分支的示意图。如图5所示,第一输入分支表示源图像所对应的输入分支,第二输入分支表示目标图像所对应的输入分支。在第二输入分支中,第二人像分割信息和第二人像解析信息为Mask辅助模块输出的分割辅助信息。需要说明的是,人体关键线信息是一种参数化人体模型(Skinned Multi-Person Linear Model),可以形象地理解成人体关键点信息中关键点连线所构成的一种姿态信息。例如,服装位移辅助网络的两个输入分支中,也可以分别添加人体关键线信息作为输入,以提高人体姿态分析的准确性。
图6为本公开实施例提供的一种基于服装位移辅助网络生成像素位移通道图的实现示意图。如图6所示,将第一人像信息和第一姿态信息输入至第一个分支,将Mask分割辅助网络的输出结果(即分割辅助信息)和第二姿态信息输入至第二个分支,通过Clothflow辅助网络的相应处理,得到的输出结果为目标服装由第一人物实例变换到第二人物实例的像素位移通道图F p。此处对Clothflow辅助网络对多个输入信息的处理方法不作具体描述。
S250、根据第一姿态信息和第二姿态信息之间的姿态位移通道图、像素位移通道图以及分割辅助信息,将目标图像中第二人物实例的服装变换为源图像中的目标服装,得到换装图像。
在本实施例中,姿态位移通道图可以指用于反映第一和第二人物实例之间的姿态位移变化的2通道图像。例如,姿态信息可以包括三维人体姿态信息、人体关键点信息和/或人体关键线信息。在此基础上,根据姿态位移通道图、像素位移通道图以及分割辅助信息,可以综合姿态位移变化、目标服装的像素位移变化以及目标服装在第二人物实例的姿态下的特征,将目标图像中第二人物实例的服装变换为源图像中的目标服装,以得到换装图像,此换装过程也可以通过神经网络实现。
本公开实施例提供的一种虚拟换装方法,具体化了根据人像信息和姿态信息以得到换装图像的过程。利用该方法,通过分割辅助网络和服装位移辅助网络对人像信息和姿态信息进行了相应的分割和位移变化预测,根据处理后得到的姿态位移通道图、像素位移通道图以及分割辅助信息进行换装变化,以实现目标服装在不同人物实例姿态变化下的位移变化和姿态融合,提高了虚拟换装的灵活性和准确性。
图7为本公开另一实施例提供的一种虚拟换装方法的流程示意图,本实施例在上述实施例中示例方案为基础进行细化。在本实施例中,对根据姿态位移通道图、像素位移通道图以及分割辅助信息,将目标图像中第二人物实例的服装变换为源图像中的目标服装,得到换装图像的过程进行了具体描述。本实施例尚未详尽的内容请参考上述任意实施例。
如图7所示,本公开实施例提供的一种虚拟换装方法,包括如下步骤:
S310、通过实例分割算法得到目标图像的背景分割图。
在本实施例中,实例分割算法可以指一种对图像中某些类别的对象实例进行像素级别分割的算法。目标图像的背景分割图可以认为是通过实例分割算法,将目标图像中的人物实例去除后所得到的剩下的图像。
S320、通过背景补全网络对目标图像的背景分割图进行背景补全,得到第一背景图。
在本实施例中,背景补全网络可以指一种将背景分割图中被去除的空白区域进行背景补全的神经网络。例如,背景补全的原理主要指,根据背景分割图中空白区域周围的纹理或特征填充该空白区域。示例性的,对于目标图像的背景分割图中人物实例所处的空白区域,通过背景补全网络生成对应的区域背景,然后将生成的区域背景填充到目标图像的背景分割图中口空白位置处,以对空白区域进行补全,得到完整的背景。在此基础上,经过背景补全后的目标图像的背景分割图为第一背景图。
图8为本公开实施例提供的一种基于背景补全网络实现背景补全的示意图。如图8所示,G B表示背景补全网络,
Figure PCTCN2022137802-appb-000005
表示目标图像的背景分割图,
Figure PCTCN2022137802-appb-000006
表示背景补全后的第一背景图。
S330、根据第一姿态信息和第二姿态信息之间的姿态位移通道图、像素位移通道图以及分割辅助信息, 通过换装网络将目标图像中第二人物实例的服装变换为源图像中的目标服装,得到换装后的第二人物实例。
在本实施例中,换装网络可以指一种具备对不同人物实例之间的服装进行变换的能力的神经网络。根据第一姿态信息和第二姿态信息之间的姿态位移通道图、像素位移通道图以及分割辅助信息,通过换装网络,能够将目标图像中所述第二人物实例的服装按照第二人物实例的姿态产生相对位移变化,以在将目标图像中第二人物实例的服装变换为目标服装的过程中,使得位移变化后的目标服装融合到第二人物实例姿态上,以得到换装后的第二人物实例。
例如,根据第一姿态信息和第二姿态信息之间的姿态位移通道图、像素位移通道图以及分割辅助信息,将目标图像中第二人物实例的服装变换为源图像中的目标服装,得到换装后的第二人物实例,包括:根据第一姿态信息和第二姿态信息确定姿态位移通道图;将姿态位移通道图与像素位移通道图叠加,得到组合位移通道图;将分割辅助信息以及组合位移通道图输入至换装网络,以通过换装网络将目标图像中第二人物实例的服装变换为源图像中的目标服装,得到换装后的第二人物实例。
其中,叠加可以指将姿态位移通道图的特征信息与像素位移通道图的特征信息相融合。在此基础上,通过将姿态位移通道图与像素位移通道图叠加,能够得到特征信息相组合后的组合位移通道图。
例如,首先将通过Mask辅助网络所得到的分割辅助信息以及叠加所得到的组合位移通道图输入至换装网络中,此外,也可以将第二人物实例的保护区域的特征信息输入至换装网络中,以保证换装后的第二人物实例的保护区域的特征保持不变;然后通过换装网络对所输入的信息进行相应的处理;最后将目标图像中第二人物实例的服装变换为源图像中的目标服装,得到换装后的第二人物实例。
S340、将换装后的第二人物实例图像融合在第一背景图中,得到换装图像。
在本实施例中,通过图像处理的相应算法,将换装后的第二人物实例融合到第一背景图中的相应位置,得到换装图像。
图9为本公开实施例提供的一种基于换装网络生成换装图像的实现示意图。如图9所示,I s表示源图像,I t表示目标图像,D s表示源图像中第一人物实例的三维人体姿态信息,D t表示目标图像中第二人物实例的三维人体姿态信息;F v表示根据D s和D t所确定的姿态位移通道图,F p表示像素位移通道图。在换装网络的输入分支中,
Figure PCTCN2022137802-appb-000007
表示分割辅助信息,P t表示第二人物实例的保护区域的特征图,F vp表示根据F v和F p叠加生成的组合位移通道图,
Figure PCTCN2022137802-appb-000008
表示目标图像对应的第一背景图。G T表示换装网络,O t表示换装网络的输出结果,即将换装后的第二人物实例图像融合在第一背景图中所得到的换装图像。
本公开实施例提供的一种虚拟换装方法,具体化了通过换装网路进行服装变换的过程。利用该方法,通过背景补全网络对目标图像进行背景补全,并将换装后的第二人物实例融合进补全后的目标图像背景图中,换装背景特征的保持能够进一步提高虚拟换装的精细程度,提升了用户虚拟换装的使用体验。
图10为本公开实施例提供的一种训练换装网络的流程示意图,本实施例在上述实施例中多个示例方案为基础进行细化。在本实施例中,对换装网络的训练和测试的过程进行了具体描述。本实施例尚未详尽的内容请参考上述任意实施例。
如图10所示,本公开实施例提供的一种虚拟换装方法,包括如下步骤:
S410、通过实例分割算法得到样本目标图像的背景分割图。
在本实施例中,样本目标图像可以指包含待换服装和对应人物实例的样本图像。样本目标图像可以为多个,例如可以是多个具有不同人物实例姿态的样本图像。在本步骤中,在将样本目标图像输入至换装网络之前,通过实例分割算法得到样本目标图像的背景分割图。
S420、通过背景补全网络对样本目标图像的背景分割图进行背景补全,得到第二背景图。
在本实施例中,通过背景补全网络对上述所得到的样本目标图像的背景分割图进行背景补全,得到对应的第二背景图。
S430、通过换装网络将样本目标图像中的服装变换为样本源图像中的目标服装,得到换装后的人物实例。
在本实施例中,样本源图像可以认为是用于提供目标服装且包含人物实例的样本图像。根据样本源图像和样本目标图像所对应的人像信息和姿态信息,通过换装网络将样本目标图像中的服装变换为样本源图像中的目标服装,得到换装后的人物实例。
S440、将换装后的人物实例图像融合在第二背景图中,得到换装结果。
在本实施例中,换装结果可以指换装后的人物实例图像融合在第二背景图中后所得到的换装图像。
S450、判断换装结果与样本换装图像之间的损失函数是否满足第一条件?若是,则执行S460;若否,则返回执行S410,继续进行迭代训练。
在本实施例中,样本换装图像可以认为是样本源图像中的目标服装变换到样本目标图像中的人物实例上的标准图像。样本源图像、样本目标图像以及样本换装图像都是已知的,可以从样本数据库中下载,也可以是来源于真实采集的图像,例如通过对不同人物进行实际换装的场景进行拍照得到。
可以理解的是,换装结果可以认为是换装网络所述的一个换装后的预测值,样本换装图像可以认为是一个换装后的真实值。损失函数可以用于表征换装网络所输出的换装结果(即预测值)与样本换装图像(即真实值)之间的差距程度。例如,损失函数越大,则表明换装结果与样本换装图像之间的差距程度越大;损失函数越小,则表明换装结果与样本换装图像之间的差距程度越小,越接近真实值,则说明换装网络的鲁棒性越好。
第一条件可以为所设定的损失阈值范围,例如,在该设定阈值范围内的损失函数,所对应的换装结果与样本换装图像的差距程度较小,满足换装网络的训练要求。
在本实施例中,换装网络基于样本源图像、样本目标图像和样本换装图像训练得到,在每次训练迭代过程中,若换装结果与样本换装图像之间的损失函数满足第一条件,则表明换装网络的训练结束;反之,若不满足第一条件,则返回执行S410,继续进行换装网络的迭代训练,直至损失函数满足第一条件为止。
S460、基于样本源图像和样本目标图像对换装网络进行测试。
本实施例中,在测试的过程中,在换装网络的输入端输入任意的两个图像,其中,两个图像可以是原来样本中的样本源图像和样本目标图像,也可以是所获取的任意两个图像。
例如,S460包括:将样本源图像中的服装变换到样本目标图像中的人物实例上,得到中间图像,并将中间图像的服装变换到样本源图像中的人物实例上,得到结果图像;根据结果图像与样本源图像的误差确定测试结果。
其中,误差可以为表征结果图像与源图像的特征信息差异程度的值,若误差很大,则表明换装网络的测试结果不好,换装网络需要进一步的训练;若误差很小,或者小于某一个设定阈值,则表明换装网络的测试效果良好,换装网络的训练和测试结束。
例如,用于测试的输入图像可以是样本源图像和样本目标图像,将样本源图像中人物实例A的服装变换到样本目标图像中人物实例B上,得到中间图像,中间图像中的人物实例记为B’,然后再将中间图像中人物实例B’的服装变换到样本源图像中人物实例A上,得到结果图像,结果图像中的人物实例记为A’。在此基础上,判断结果图像与样本源图像之间的误差(主要是人物实例A与A’的误差),即可确定测试结果。
例如,换装网络还可以共享重建网络的参数。重建网络可以认为是根据人像信息和姿态信息,对去掉人物实例的背景补全图像进行重建的神经网络。本实施例中利用重建网络对源图像进行重建,并将重建网络中学习到的特征变换规律以网络参数的形式提供给换装网络,辅助换装网络的训练学习,在提高换装网络训练效率的基础上,还可以提高换装网络的性能。
例如,重建网络用于在换装网络的训练过程中,在每次迭代过程中,执行以下操作,直至重建的图像与样本源图像之间的损失函数满足第二条件时,将重建网络的网络参数共享给换装网络:通过实例分割算法得到样本源图像的背景分割图;通过背景补全网络对样本源图像的背景分割图进行背景补全,得到第三背景图;通过重建网络,根据样本源图像的人像信息、样本源图像与样本源图像的姿态信息的交集、以及第三背景图进行图像重建,得到重建的图像。
其中,第二条件可以为所设定的一个损失阈值范围,例如,在该设定阈值范围内的损失函数,所对应的重建的图像与样本源图像之间的差距程度较小,表明重建网络的重建效果良好。在此基础上,当重建的图像与样本源图像之间的损失函数满足第二条件时,说明重建网络的重建效果良好,对应的网络参数具备可靠性,此时可以将重建网络的网络参数共享给换装网络。
图11为本公开实施例提供的一种基于重建网络重建样本源图像的实现示意图。如图11所示,G s表示重建网络,
Figure PCTCN2022137802-appb-000009
表示样本源图像对应的第三背景图,
Figure PCTCN2022137802-appb-000010
表示样本源图像的人像信息中的人像分割信息,
Figure PCTCN2022137802-appb-000011
表示将样本源图像与样本源图像的三维人体姿态信息叠加后的图像(叠加的目的是,利用样本源图像的三维人体姿态信息对样本源图像取交集,使其范围缩小,服装特征有缺失;在与样本源图像的人像分割信息融合时,再恢复出完整的服装特征,从而实现重建)。在重建网络的输出端,O s表示输出结果,即根据
Figure PCTCN2022137802-appb-000012
Figure PCTCN2022137802-appb-000013
进行图像重建,所得到的重建图像。I s表示样本源图像,ζ表示重建的图像与样本源图像之间的损失函数。
图12为本公开实施例提供的一种训练换装网络的实现示意图。如图12所示,在换装网络的训练过程中,重建网络将自身训练完成并满足第二条件的网络参数共享给换装网络,在此基础上,换装网络利用重建网络的网络参数继续进行迭代训练。其中,ζ1表示重建的图像与样本源图像之间的损失函数,ζ2表示换装网络的换装结果与样本换装图像之间的损失函数。
本公开实施例提供的一种虚拟换装方法,具体化了通过对换装网路进行训练和测试的过程。利用该方法,在换装网络的训练过程中,通过将重建网络训练完成的可靠网络参数共享给换装网络,能够辅助换装网络的训练学习,并且在提高换装网络的训练效率的基础上还可以有效提高换装网络的性能。此外,通过对训练完成的换装网络进行测试,能够进一步保证换装网络的性能效果,从而提高虚拟换装的准确性。
图13为本公开实施例提供的一种虚拟换装方法的流程示意图,该方法可适用于对任意输入图像中的 人物实例进行换装的情况的情况,该方法可以由虚拟换装装置来执行,其中该装置可由软件和/或硬件实现,并一般集成在电子设备上,在本实施例中电子设备包括但不限于:台式计算机、笔记本电脑、服务器、平板电脑或智能手机等设备。需要说明的是,本实施例的虚拟换装方法,可针对指定服装,实现对任意一张输入图像的快速换装,而不需要用户提供源图像。因此可适用于简化的移动应用设备,例如手机或平板电脑等。本实施例尚未详尽的内容请参考上述任意实施例。
如图13所示,本公开实施例提供的一种虚拟换装方法,包括如下步骤:
S510、获取待换装图像。
在本实施例中,待换装图像可以认为是包含等待虚拟换装的人物实例的图像,例如,待换装图像可以是由用户拍摄或导入到电子设备的图像,也可以是从电子设备本地相册中所读取的图像,该图像中包含了待换装的人物实例。
S520、通过预设网络将待换装图像中人物实例的服装变换为指定服装。
其中,预设网络基于样本图像对训练得到,样本图像对包括样本目标图像和对应的样本换装图像,样本换装图像根据样本目标图像的姿态信息得到。
在本实施例中,预设网络可以认为是指所预先训练好的一个可进行服装虚拟变换的网络。指定服装可以认为是目标服装。样本目标图像可以指包含待换装的人物实例的样本图像,样本换装图像可以指根据样本目标图像的姿态信息将待换装的人物实例的服装变换为指定服装后所得到的图像。样本图像对可以指由大量的样本目标图像和对应的样本换装图像所构成的多个样本对。
需要说明的是,预设网络具备对任意输入图像中的人物实例进行指定服装的换装的能力,其不需要用户提供源图像。不同于换装网络,预设网络的训练过程只需要用到样本目标图像和样本换装图像即可,将样本目标图像作为输入,样本换装图像作为输出。其中,样本目标图像和对应的样本换装图像分别对应于待换装的人物实例换装前后的状态。
本实施例中,样本换装图像根据样本目标图像的姿态信息得到的。对于指定服装,目标图像中的人物实例的姿态不同,则换装过程中指定服装所需的位移变化量不同。需要说明的是,样本目标图像和样本换装图像可以来自于上述实施例中换装网络的训练样本;或者,样本目标图像可以是任意的包含人物实例的图像,而其对应的样本换装图像可以通过上述实施例中的换装网络生成,这种情况下,对于任意的包含人物实例的图像,根据提供目标服装的源图像,生成对应的换装图像的过程可参见上述任意实施例,此处对此不作详细的展开描述。
本实施例中,指定服装可以为一种默认的固定服装,在这种情况下,预设网络经过训练,可以专门用于将待换装图像中人物实例的服装变换为这个默认的固定服装。对于不同的服装,可以分别训练的一个专用的预设网络。在通过预设网络进行服装变换时所占用的计算资源少,所得到的换装结果更专业、更准确。
预设网络也可以用于多种服装的换装,指定服装也可以由用户(如待换装的用户)从服装图库中任意指定,在这种情况下,预设网络需要学习不同的服装变换到待换装的人物实例上的特征变换规律,因此在进行服装变换时对预设网络的性能要求有所提高,据此可以针对于不同的指定服装灵活的实现服装变换。
示例性的,预设网络可以是一种生成式对抗网络(Generative Adversarial Networks,GAN),例如基于GAN的图像到图像的翻译(pix2pix)模型算法网络。需要说明的是,预设网络是一个已训练好的单独的网络,可直接进行服装的变换,例如,对于指定的服装,给预设网络一个输入,如待换装图像,就能直接在预设网络的输出端得到对应的换装图像。
例如,根据样本目标图像的姿态信息得到样本换装图像的方法,可以根据上述实施例中任一项的虚拟换装方法确定。
例如,可以通过上述任意实施例中的虚拟换装方法,根据样本目标图像生成对应的样本换装图像,在此基础上,将样本目标图像和对应样本换装图像所构成的多个样本图像对作为预设网络的训练数据,以训练预设网络,使得预设网络可以通过训练学习到服装变换过程中特征变换的规律,具备相应的换装能力,从而可以根据一个输入,如待换装图像,得到相应的换装结果,以实现虚拟换装的实际应用。
例如,预设网络包括生成器和判别器;预设网络的训练过程包括:针对指定服装,通过生成器根据样本目标图像生成合成图像;通过判别器根据样本换装图像判别合成图像的真实性;重复上述生成合成图像以及判别合成图像的真实性的操作,直至判别结果满足训练停止条件。
其中,生成器可以指预设网络中用于通过学习真实图像的特征分布以生成特定图像的模块;例如针对指定服装,通过生成器根据样本目标图像的特征分布信息,以学习生成指定服装变换到样本目标图像中人物实例上的合成图像,该合成图像可以认为是通过生成器所生成的换装图像。判别器可以指预设网络中用于根据真实图像对生成器所生成的图像进行真假判别的模块;例如,通过判别器根据样本换装图像判别合成图像的真实性。
训练停止条件可以指根据判别器对合成图像真实性的判别结果所确定的停止训练的条件。例如训练停止条件可以是所有合成图像的判别结果都为真,或者也可以是判别结果为真的合成图像所占比例满足一个设定阈值,如百分之九十的合成图像的判别结果为真,此处对阈值的设定不做限定。
在预设网络训练的过程中,通过生成器和判别器去执行重复生成合成图像以及判别合成图像的真实性的操作,直至判别结果满足训练停止条件为止,预设网络训练完成。
本公开实施例提供的一种虚拟换装方法,该方法首先获取待换装图像,然后通过预设网络将所述待换装图像中人物实例的服装变换为指定服装;其中,所述预设网络基于样本图像对训练得到,所述样本图像对包括样本目标图像和对应的样本换装图像,所述样本换装图像根据所述样本目标图像的姿态信息得到。利用该方法,通过一个独立的预设网络,根据一个输入即可实现服装的虚拟变换,为用户在实际应用中提供了便利。此外,利用上述实施例中虚拟换装方法生成的样本图像对作为训练数据,为预设网络的训练提供了可靠依据,使得训练完成后的预设网络能够保证换装后的图像质量,并在此基础上实现了快速、准确的虚拟换装。
图14为本公开实施例提供的一种虚拟换装装置的结构示意图,其中该装置可由软件和/或硬件实现,并一般集成在电子设备上。
如图14所示,该装置包括:第一获取模块610、处理模块620以及第一换装模块630;
其中,第一获取模块610,设置为获取源图像和目标图像,所述源图像中包括关联于所述第一人物实例的目标服装,所述目标图像中包括第二人物实例;
处理模块620,设置为对所述源图像和所述目标图像进行处理,分别得到所述源图像的第一人像信息和第一姿态信息,以及所述目标图像的第二姿态信息;
第一换装模块630,设置为根据所述第一人像信息、所述第一姿态信息和所述第二姿态信息,将所述目标图像中所述第二人物实例的服装变换为所述源图像中的目标服装,得到换装图像。
在本实施例中,该装置首先通过第一获取模块610获取源图像和目标图像,源图像中包括关联于第一人物实例的目标服装,目标图像中包括第二人物实例;然后通过处理模块620对源图像和目标图像进行处理,分别得到源图像的第一人像信息和第一姿态信息,以及目标图像的第二姿态信息;最后通过第一换装模块630根据第一人像信息、第一姿态信息和第二姿态信息,将目标图像中第二人物实例的服装变换为源图像中的目标服装,得到换装图像。利用上述装置,通过利用源图像和目标图像的人像信息和姿态信息,能够实现目标服装在不同人物之间的变换,在此基础上,还可根据不同人物实例的姿态,实现对任意姿态的人物的换装,从而提高了虚拟换装的灵活性和准确性。
例如,第一换装模块630包括:
分割辅助信息确定单元,设置为将所述第一人像信息、所述第一姿态信息和所述第二姿态信息输入至分割辅助网络,得到所述目标服装在所述第二人物实例的姿态下的分割辅助信息;
像素位移通道图确定单元,设置为将所述第一人像信息、所述第一姿态信息、所述第二姿态信息以及所述分割辅助信息输入至服装位移辅助网络,得到所述目标服装由所述第一人物实例变换到所述第二人物实例的像素位移通道图;
换装图像确定单元,设置为根据所述第一姿态信息和所述第二姿态信息之间的姿态位移通道图、所述像素位移通道图以及所述分割辅助信息,将所述目标图像中所述第二人物实例的服装变换为所述源图像中的目标服装,得到换装图像。
例如,所述分割辅助网络为包括双分支输入的神经网络,
第一个分支的输入包括所述第一人像信息和所述第一姿态信息,其中,所述第一人像信息包括第一人像分割信息和第一人像解析信息,所述第一姿态信息包括所述第一人物实例的三维人体姿态信息和人体关键点信息;
第二个分支的输入包括所述第二姿态信息,所述第二姿态信息包括所述第二人物实例的三维人体姿态信息和人体关键点信息。
例如,所述服装位移辅助网络为包括双分支输入的神经网络,
第一个分支的输入包括所述第一人像信息和所述第一姿态信息,其中,所述第一人像信息包括第一人像分割信息和第一人像解析信息,所述第一姿态信息包括所述第一人物实例的三维人体姿态信息和人体关键点信息;
第二个分支的输入包括所述分割辅助信息和所述第二姿态信息,其中,所述分割辅助信息包括所述目标服装在所述第二人物实例的姿态下的第二人像分割信息和第二人像解析信息,所述第二姿态信息包括所述第二人物实例的三维人体姿态信息和人体关键点信息。
例如,所述换装图像确定单元包括:
分割图确定子单元,设置为通过实例分割算法得到所述目标图像的背景分割图;
第一背景图确定子单元,设置为通过背景补全网络对所述目标图像的背景分割图进行背景补全,得到第一背景图;
第二人物实例确定子单元,设置为根据所述第一姿态信息和所述第二姿态信息之间的姿态位移通道图、所述像素位移通道图以及所述分割辅助信息,通过换装网络将所述目标图像中所述第二人物实例的服装变换为所述源图像中的目标服装,得到换装后的第二人物实例;
换装图像确定子单元,设置为将所述换装后的第二人物实例图像融合在所述第一背景图中,得到换装图像。
例如,所述第二人物实例确定子单元包括:
根据所述第一姿态信息和所述第二姿态信息确定所述姿态位移通道图;
将所述姿态位移通道图与所述像素位移通道图叠加,得到组合位移通道图;
将所述分割辅助信息以及所述组合位移通道图输入至所述换装网络,以通过所述换装网络将所述目标图像中所述第二人物实例的服装变换为所述源图像中的目标服装,得到换装后的第二人物实例。
例如,所述装置还包括:根据所述第二人像信息确定所述第二人物实例的保护区域,在将所述目标图像中所述第二人物实例的服装变换为所述源图像中的目标服装的过程中,所述保护区域内的特征保持不变。
例如,所述换装网络基于样本源图像、样本目标图像和样本换装图像训练得到;
在上述实施例的基础上,所述换装网络的训练过程包括:在每次迭代过程中,执行以下操作,直至所述换装结果与所述样本换装图像之间的损失函数满足第一条件:
通过实例分割算法得到所述样本目标图像的背景分割图;
通过背景补全网络对所述样本目标图像的背景分割图进行背景补全,得到第二背景图;
通过所述换装网络将所述样本目标图像中的服装变换为所述样本源图像中的目标服装,得到换装后的人物实例;
将所述换装后的人物实例图像融合在所述第二背景图中,得到换装结果。
例如,所述换装网络共享重建网络的网络参数;
其中,所述重建网络设置为在所述换装网络的训练过程中,在每次迭代过程中,执行以下操作,直至重建的图像与样本所述源图像之间的损失函数满足第二条件时,将所述重建网络的网络参数共享给所述换装网络:
通过实例分割算法得到所述样本源图像的背景分割图;
通过所述背景补全网络对所述样本源图像的背景分割图进行背景补全,得到第三背景图;
通过重建网络,根据所述样本源图像的人像信息、所述样本源图像与所述样本源图像的姿态信息的交集、以及所述第三背景图进行图像重建,得到重建的图像。
例如,在所述装置中还包括:
测试模块,设置为基于样本源图像和样本目标图像对所述换装网络进行测试。
在上述实施例的基础上,所述对所述换装网络进行测试,包括:
将所述样本源图像中的服装变换到所述样本目标图像中的人物实例上,得到中间图像,并将所述中间图像的服装变换到所述样本源图像中的人物实例上,得到结果图像;
根据所述结果图像与所述样本源图像的误差确定测试结果。
上述虚拟换装装置可执行本公开实施例一至四所提供的虚拟换装方法,具备执行方法相应的功能模块和有益效果。
图15为本公开实施例提供的一种虚拟换装装置的结构示意图,其中该装置可由软件和/或硬件实现,并一般集成在电子设备上。
如图15所示,该装置包括:第二确定模块710以及第二换装模块720;
其中,第二确定模块710,设置为获取待换装图像;
第二换装模块720,设置为通过预设网络将所述待换装图像中人物实例的服装变换为指定服装;其中,所述预设网络基于样本图像对训练得到,所述样本图像对包括样本目标图像和对应的样本换装图像,所述样本换装图像根据所述样本目标图像的姿态信息得到。
在本实施例中,该装置首先通过第二确定模块710获取待换装图像,然后第二换装模块720通过预设网络将所述待换装图像中人物实例的服装变换为指定服装;其中,所述预设网络基于样本图像对训练得到,所述样本图像对包括样本目标图像和对应的样本换装图像,所述样本换装图像根据所述样本目标图像的姿态信息得到。利用该装置,通过一个独立的预设网络,根据一个输入即可实现服装的虚拟变换,为用户在实际应用中提供了便利。此外,利用上述实施例中虚拟换装方法生成的样本图像对作为训练数据,为预设网络的训练提供了可靠依据,使得训练完成后的预设网络能够保证换装后的图像质量,并在此基础上实现了快速、准确的虚拟换装。
例如,所述预设网络包括生成器和判别器;
所述预设网络的训练过程包括:针对所述指定服装,通过所述生成器根据所述样本目标图像生成合成图像;
通过所述判别器根据所述样本换装图像判别所述合成图像的真实性;
重复上述生成合成图像以及判别所述合成图像的真实性的操作,直至判别结果满足训练停止条件。
例如,根据所述样本目标图像的姿态信息得到所述样本换装图像的方法,根据上述任意实施例任一项所述的虚拟换装方法得到。
上述虚拟换装装置可执行本公开实施例所提供的虚拟换装方法,具备执行方法相应的功能模块和有益效果。
图16为本公开实施例提供的一种电子设备的结构示意图。图16示出了适于用来实现本公开实施例的电子设备400的结构示意图。图16示出的电子设备400仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图16所示,电子设备400可以包括一个或多个处理器(例如中央处理器、图形处理器等)401,其可以根据存储在只读存储器(Read-Only Memory,ROM)402中的程序或者从存储装置408加载到随机访问存储器(Random Access Memory,RAM)403中的程序而执行多种适当的动作和处理。一个或多个处理器401实现如本公开提供的虚拟换装方法。在RAM403中,还存储有电子设备400操作所需的多种程序和数据。处理器401、ROM 402以及RAM403通过总线404彼此相连。输入/输出(Input/Output,I/O)接口405也连接至总线404。
通常,以下装置可以连接至I/O接口405:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置406;包括例如液晶显示器(Liquid Crystal Display,LCD)、扬声器、振动器等的输出装置407;包括例如磁带、硬盘等的存储装置408,存储装置408设置为存储一个或多个程序;以及通信装置409。通信装置409可以允许电子设备400与其他设备进行无线或有线通信以交换数据。虽然图16示出了具有多种装置的电子设备400,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置409从网络上被下载和安装,或者从存储装置408被安装,或者从ROM402被安装。在该计算机程序被处理器401执行时,执行本公开实施例的方法中限定的上述功能。计算机可读存储介质可以为非暂态计算机可读存储介质。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是,但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如超文本传输协议(Hyper Text Transfer Protocol,HTTP)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(LAN),广域网(WAN),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备400中所包含的;也可以是单独存在,而未装配入该电子设备400中。
上述计算机可读介质存储有一个或者多个计算机程序,当上述一个或者多个程序被处理器执行时实现如下方法:上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备400:可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括面向对象的程序设计语言,诸如Java、Smalltalk、C++,还包括常规的过 程式程序设计语言,诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN)连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开多种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,模块的名称在某种情况下并不构成对该模块本身的限定。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(Field Programmable Gate Array,FPGA)、专用集成电路(Application Specific Integrated Circuit,ASIC)、专用标准产品(Application Specific Standard Parts,ASSP)、片上系统(System on Chip,SOC)、复杂可编程逻辑设备(Complex Programming logic device,CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,示例1提供了一种虚拟换装方法,包括:
获取源图像和目标图像,所述源图像中包括关联于第一人物实例的目标服装,所述目标图像中包括第二人物实例;
对所述源图像和所述目标图像进行处理,分别得到所述源图像的第一人像信息和第一姿态信息,以及所述目标图像的第二姿态信息;
根据所述第一人像信息、所述第一姿态信息和所述第二姿态信息,将所述目标图像中所述第二人物实例的服装变换为所述源图像中的目标服装,得到换装图像。
根据本公开的一个或多个实施例,示例2根据示例1所述的方法,
根据所述第一人像信息、所述第一姿态信息和所述第二姿态信息,将所述目标图像中所述第二人物实例的服装变换为所述源图像中的目标服装,得到换装图像,包括:
将所述第一人像信息、所述第一姿态信息和所述第二姿态信息输入至分割辅助网络,得到所述目标服装在所述第二人物实例的姿态下的分割辅助信息;
将所述第一人像信息、所述第一姿态信息、所述第二姿态信息以及所述分割辅助信息输入至服装位移辅助网络,得到所述目标服装由所述第一人物实例变换到所述第二人物实例的像素位移通道图;
根据所述第一姿态信息和所述第二姿态信息之间的姿态位移通道图、所述像素位移通道图以及所述分割辅助信息,将所述目标图像中所述第二人物实例的服装变换为所述源图像中的目标服装,得到换装图像。
根据本公开的一个或多个实施例,示例3根据示例2所述的方法,
所述分割辅助网络为包括双分支输入的神经网络,
第一个分支的输入包括所述第一人像信息和所述第一姿态信息,其中,所述第一人像信息包括第一人像分割信息和第一人像解析信息,所述第一姿态信息包括所述第一人物实例的三维人体姿态信息和人体关键点信息;
第二个分支的输入包括所述第二姿态信息,所述第二姿态信息包括所述第二人物实例的三维人体姿态信息和人体关键点信息。
根据本公开的一个或多个实施例,示例4根据示例2所述的方法,
所述服装位移辅助网络为包括双分支输入的神经网络,
第一个分支的输入包括所述第一人像信息和所述第一姿态信息,其中,所述第一人像信息包括第一人 像分割信息和第一人像解析信息,所述第一姿态信息包括所述第一人物实例的三维人体姿态信息和人体关键点信息;
第二个分支的输入包括所述分割辅助信息和所述第二姿态信息,其中,所述分割辅助信息包括所述目标服装在所述第二人物实例的姿态下的第二人像分割信息和第二人像解析信息,所述第二姿态信息包括所述第二人物实例的三维人体姿态信息和人体关键点信息。
根据本公开的一个或多个实施例,示例5根据示例2所述的方法,
根据所述第一姿态信息和所述第二姿态信息之间的姿态位移通道图、所述像素位移通道图以及所述分割辅助信息,将所述目标图像中所述第二人物实例的服装变换为所述源图像中的目标服装,得到换装图像,包括:
通过实例分割算法得到所述目标图像的背景分割图;
通过背景补全网络对所述目标图像的背景分割图进行背景补全,得到第一背景图;
根据所述第一姿态信息和所述第二姿态信息之间的姿态位移通道图、所述像素位移通道图以及所述分割辅助信息,通过换装网络将所述目标图像中所述第二人物实例的服装变换为所述源图像中的目标服装,得到换装后的第二人物实例;
将所述换装后的第二人物实例图像融合在所述第一背景图中,得到换装图像。
根据本公开的一个或多个实施例,示例6根据示例5所述的方法,
根据所述第一姿态信息和所述第二姿态信息之间的姿态位移通道图、所述像素位移通道图以及所述分割辅助信息,将所述目标图像中所述第二人物实例的服装变换为所述源图像中的目标服装,得到换装后的第二人物实例,包括:
根据所述第一姿态信息和所述第二姿态信息确定所述姿态位移通道图;
将所述姿态位移通道图与所述像素位移通道图叠加,得到组合位移通道图;
将所述分割辅助信息以及所述组合位移通道图输入至所述换装网络,以通过所述换装网络将所述目标图像中所述第二人物实例的服装变换为所述源图像中的目标服装,得到换装后的第二人物实例。
根据本公开的一个或多个实施例,示例7根据示例1所述的方法,还包括:
根据所述第二人像信息确定所述第二人物实例的保护区域,在将所述目标图像中所述第二人物实例的服装变换为所述源图像中的目标服装的过程中,所述保护区域内的特征保持不变。
根据本公开的一个或多个实施例,示例8根据示例5所述的方法,
所述换装网络基于样本源图像、样本目标图像和样本换装图像训练得到;
所述换装网络的训练过程包括:在每次迭代过程中,执行以下操作,直至所述换装结果与所述样本换装图像之间的损失函数满足第一条件:
通过实例分割算法得到所述样本目标图像的背景分割图;
通过背景补全网络对所述样本目标图像的背景分割图进行背景补全,得到第二背景图;
通过所述换装网络将所述样本目标图像中的服装变换为所述样本源图像中的目标服装,得到换装后的人物实例;
将所述换装后的人物实例图像融合在所述第二背景图中,得到换装结果。
根据本公开的一个或多个实施例,示例9根据示例8所述的方法,
所述换装网络共享重建网络的网络参数;
其中,所述重建网络用于在所述换装网络的训练过程中,在每次迭代过程中,执行以下操作,直至重建的图像与所述样本源图像之间的损失函数满足第二条件时,将所述重建网络的网络参数共享给所述换装网络:
通过实例分割算法得到所述样本源图像的背景分割图;
通过所述背景补全网络对所述样本源图像的背景分割图进行背景补全,得到第三背景图;
通过重建网络,根据所述样本源图像的人像信息、所述样本源图像与所述样本源图像的姿态信息的交集、以及所述第三背景图进行图像重建,得到重建的图像。
根据本公开的一个或多个实施例,示例10根据示例5所述的方法,还包括:基于样本源图像和样本目标图像对所述换装网络进行测试;
所述对所述换装网络进行测试,包括:
将所述样本源图像中的服装变换到所述样本目标图像中的人物实例上,得到中间图像,并将所述中间图像的服装变换到所述样本源图像中的人物实例上,得到结果图像;
根据所述结果图像与所述样本源图像的误差确定测试结果。
根据本公开的一个或多个实施例,示例11提供了一种虚拟换装方法,包括:
获取待换装图像;
通过预设网络将所述待换装图像中人物实例的服装变换为指定服装;
其中,所述预设网络基于样本图像对训练得到,所述样本图像对包括样本目标图像和对应的样本换装图像,所述样本换装图像根据所述样本目标图像的姿态信息得到。
根据本公开的一个或多个实施例,示例12根据示例11所述的方法,
所述预设网络包括生成器和判别器;
所述预设网络的训练过程包括:针对所述指定服装,通过所述生成器根据所述样本目标图像生成合成图像;
通过所述判别器根据所述样本换装图像判别所述合成图像的真实性;
重复上述生成合成图像以及判别所述合成图像的真实性的操作,直至判别结果满足训练停止条件。
根据本公开的一个或多个实施例,示例13根据示例12所述的方法,根据所述样本目标图像的姿态信息得到所述样本换装图像的方法,根据示例1-10任一项所述的虚拟换装方法得到。
根据本公开的一个或多个实施例,示例14提供了一种虚拟换装装置,包括:
第一获取模块,设置为获取源图像和目标图像,所述源图像中包括关联于第一人物实例的目标服装,所述目标图像中包括第二人物实例;
处理模块,设置为对所述源图像和所述目标图像进行处理,分别得到所述源图像的第一人像信息和第一姿态信息,以及所述目标图像的第二姿态信息;
第一换装模块,设置为根据所述第一人像信息、所述第一姿态信息和所述第二姿态信息,将所述目标图像中所述第二人物实例的服装变换为所述源图像中的目标服装,得到换装图像。
根据本公开的一个或多个实施例,示例15提供了一种虚拟换装装置,包括:
第二确定模块,设置为获取待换装图像;
第二换装模块,设置为通过预设网络将所述待换装图像中人物实例的服装变换为指定服装;其中,所述预设网络基于样本图像对训练得到,所述样本图像对包括样本目标图像和对应的样本换装图像,所述样本换装图像根据所述样本目标图像的姿态信息得到。
根据本公开的一个或多个实施例,示例16提供了一种电子设备,包括:
一个或多个处理器;
存储装置,设置为存储一个或多个程序;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如示例1-13中任一所述的方法。
根据本公开的一个或多个实施例,示例17提供了一种计算机可读介质,其上存储有计算机程序,该程序被处理器执行时实现如示例1-13中任一所述的方法。
此外,虽然采用特定次序描绘了多种操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的多种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。

Claims (17)

  1. 一种虚拟换装方法,包括:
    获取源图像和目标图像,所述源图像中包括关联于第一人物实例的目标服装,所述目标图像中包括第二人物实例;
    对所述源图像和所述目标图像进行处理,分别得到所述源图像的第一人像信息和第一姿态信息,以及所述目标图像的第二姿态信息;
    根据所述第一人像信息、所述第一姿态信息和所述第二姿态信息,将所述目标图像中所述第二人物实例的服装变换为所述源图像中的目标服装,得到换装图像。
  2. 根据权利要求1所述的方法,其中,所述根据所述第一人像信息、所述第一姿态信息和所述第二姿态信息,将所述目标图像中所述第二人物实例的服装变换为所述源图像中的目标服装,得到换装图像,包括:
    将所述第一人像信息、所述第一姿态信息和所述第二姿态信息输入至分割辅助网络,得到所述目标服装在所述第二人物实例的姿态下的分割辅助信息;
    将所述第一人像信息、所述第一姿态信息、所述第二姿态信息以及所述分割辅助信息输入至服装位移辅助网络,得到所述目标服装由所述第一人物实例变换到所述第二人物实例的像素位移通道图;
    根据所述第一姿态信息和所述第二姿态信息之间的姿态位移通道图、所述像素位移通道图以及所述分割辅助信息,将所述目标图像中所述第二人物实例的服装变换为所述源图像中的目标服装,得到换装图像。
  3. 根据权利要求2所述的方法,其中,所述分割辅助网络为包括双分支输入的神经网络,
    所述分割辅助网络的第一个分支的输入包括所述第一人像信息和所述第一姿态信息,其中,所述第一人像信息包括第一人像分割信息和第一人像解析信息,所述第一姿态信息包括所述第一人物实例的三维人体姿态信息和人体关键点信息;
    所述分割辅助网络的第二个分支的输入包括所述第二姿态信息,所述第二姿态信息包括所述第二人物实例的三维人体姿态信息和人体关键点信息。
  4. 根据权利要求2所述的方法,其中,所述服装位移辅助网络为包括双分支输入的神经网络,
    所述服装位移辅助网络的第一个分支的输入包括所述第一人像信息和所述第一姿态信息,其中,所述第一人像信息包括第一人像分割信息和第一人像解析信息,所述第一姿态信息包括所述第一人物实例的三维人体姿态信息和人体关键点信息;
    所述服装位移辅助网络的第二个分支的输入包括所述分割辅助信息和所述第二姿态信息,其中,所述分割辅助信息包括所述目标服装在所述第二人物实例的姿态下的第二人像分割信息和第二人像解析信息,所述第二姿态信息包括所述第二人物实例的三维人体姿态信息和人体关键点信息。
  5. 根据权利要求2所述的方法,其中,所述根据所述第一姿态信息和所述第二姿态信息之间的姿态位移通道图、所述像素位移通道图以及所述分割辅助信息,将所述目标图像中所述第二人物实例的服装变换为所述源图像中的目标服装,得到换装图像,包括:
    通过实例分割算法得到所述目标图像的背景分割图;
    通过背景补全网络对所述目标图像的背景分割图进行背景补全,得到第一背景图;
    根据所述第一姿态信息和所述第二姿态信息之间的姿态位移通道图、所述像素位移通道图以及所述分割辅助信息,通过换装网络将所述目标图像中所述第二人物实例的服装变换为所述源图像中的目标服装,得到换装后的第二人物实例;
    将所述换装后的第二人物实例图像融合在所述第一背景图中,得到换装图像。
  6. 根据权利要求5所述的方法,其中,所述根据所述第一姿态信息和所述第二姿态信息之间的姿态位移通道图、所述像素位移通道图以及所述分割辅助信息,将所述目标图像中所述第二人物实例的服装变换为所述源图像中的目标服装,得到换装后的第二人物实例,包括:
    根据所述第一姿态信息和所述第二姿态信息确定所述姿态位移通道图;
    将所述姿态位移通道图与所述像素位移通道图叠加,得到组合位移通道图;
    将所述分割辅助信息以及所述组合位移通道图输入至所述换装网络,以通过所述换装网络将所述目标图像中所述第二人物实例的服装变换为所述源图像中的目标服装,得到换装后的第二人物实例。
  7. 根据权利要求1所述的方法,还包括:
    根据所述第二人像信息确定所述第二人物实例的保护区域,在将所述目标图像中所述第二人物实例的服装变换为所述源图像中的目标服装的过程中,所述保护区域内的特征保持不变。
  8. 根据权利要求5所述的方法,其中,所述换装网络基于样本源图像、样本目标图像和样本换装图像进行迭代训练得到;
    所述换装网络的训练过程包括:在每次迭代过程中,执行以下操作,直至所述换装结果与所述样本换装图像之间的损失函数满足第一条件:
    通过实例分割算法得到所述样本目标图像的背景分割图;
    通过背景补全网络对所述样本目标图像的背景分割图进行背景补全,得到第二背景图;
    通过所述换装网络将所述样本目标图像中的服装变换为所述样本源图像中的目标服装,得到换装后的人物实例;
    将所述换装后的人物实例图像融合在所述第二背景图中,得到换装结果。
  9. 根据权利要求8所述的方法,其中,所述换装网络共享重建网络的网络参数;
    其中,所述重建网络用于在所述换装网络的训练过程中,在每次迭代过程中,执行以下操作,直至重建的图像与所述样本源图像之间的损失函数满足第二条件时,将所述重建网络的网络参数共享给所述换装网络:
    通过实例分割算法得到所述样本源图像的背景分割图;
    通过所述背景补全网络对所述样本源图像的背景分割图进行背景补全,得到第三背景图;
    通过重建网络,根据所述样本源图像的人像信息、所述样本源图像与所述样本源图像的姿态信息的交集、以及所述第三背景图进行图像重建,得到重建的图像。
  10. 根据权利要求5所述的方法,还包括:基于样本源图像和样本目标图像对所述换装网络进行测试;
    所述对所述换装网络进行测试,包括:
    将所述样本源图像中的服装变换到所述样本目标图像中的人物实例上,得到中间图像,并将所述中间图像的服装变换到所述样本源图像中的人物实例上,得到结果图像;
    根据所述结果图像与所述样本源图像的误差确定测试结果。
  11. 一种虚拟换装方法,包括:
    获取待换装图像;
    通过预设网络将所述待换装图像中人物实例的服装变换为指定服装;
    其中,所述预设网络基于样本图像对训练得到,所述样本图像对包括样本目标图像和对应的样本换装图像,所述样本换装图像根据所述样本目标图像的姿态信息得到。
  12. 根据权利要求11所述的方法,其中,
    所述预设网络包括生成器和判别器;
    所述预设网络的训练过程包括:
    针对所述指定服装,通过所述生成器根据所述样本目标图像生成合成图像;
    通过所述判别器根据所述样本换装图像判别所述合成图像的真实性;
    重复上述生成合成图像以及判别所述合成图像的真实性的操作,直至判别结果满足训练停止条件。
  13. 根据权利要求12所述的方法,在所述通过预设网络将所述待换装图像中人物实例的服装变换为指定服装之前,根据权利要求1-10任一项所述的虚拟换装方法确定所述样本换装图像。
  14. 一种虚拟换装装置,包括:
    第一获取模块,设置为获取源图像和目标图像,所述源图像中包括关联于第一人物实例的目标服装,所述目标图像中包括第二人物实例;
    处理模块,设置为对所述源图像和所述目标图像进行处理,分别得到所述源图像的第一人像信息和第一姿态信息,以及所述目标图像的第二姿态信息;
    第一换装模块,设置为根据所述第一人像信息、所述第一姿态信息和所述第二姿态信息,将所述目标图像中所述第二人物实例的服装变换为所述源图像中的目标服装,得到换装图像。
  15. 一种虚拟换装装置,包括:
    第二确定模块,设置为获取待换装图像;
    第二换装模块,设置为通过预设网络将所述待换装图像中人物实例的服装变换为指定服装;其中,所述预设网络基于样本图像对训练得到,所述样本图像对包括样本目标图像和对应的样本换装图像,所述样本换装图像根据所述样本目标图像的姿态信息得到。
  16. 一种电子设备,包括:
    一个或多个处理器;
    存储装置,设置为存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-13中任一所述的虚拟换装方法。
  17. 一种计算机可读介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1-13中任一所述的虚拟换装方法。
PCT/CN2022/137802 2021-12-15 2022-12-09 虚拟换装方法、装置、电子设备及可读介质 WO2023109666A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111539373.2A CN116309005A (zh) 2021-12-15 2021-12-15 虚拟换装方法、装置、电子设备及可读介质
CN202111539373.2 2021-12-15

Publications (1)

Publication Number Publication Date
WO2023109666A1 true WO2023109666A1 (zh) 2023-06-22

Family

ID=86774897

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/137802 WO2023109666A1 (zh) 2021-12-15 2022-12-09 虚拟换装方法、装置、电子设备及可读介质

Country Status (2)

Country Link
CN (1) CN116309005A (zh)
WO (1) WO2023109666A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117635763A (zh) * 2023-11-28 2024-03-01 广州像素数据技术股份有限公司 基于人像部件分析的自动换装方法、装置、设备及介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107067299A (zh) * 2017-03-29 2017-08-18 深圳奥比中光科技有限公司 虚拟试衣方法和系统
JP2018113060A (ja) * 2018-03-14 2018-07-19 株式会社東芝 仮想試着装置、仮想試着システム、仮想試着方法、およびプログラム
US20190371080A1 (en) * 2018-06-05 2019-12-05 Cristian SMINCHISESCU Image processing method, system and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107067299A (zh) * 2017-03-29 2017-08-18 深圳奥比中光科技有限公司 虚拟试衣方法和系统
JP2018113060A (ja) * 2018-03-14 2018-07-19 株式会社東芝 仮想試着装置、仮想試着システム、仮想試着方法、およびプログラム
US20190371080A1 (en) * 2018-06-05 2019-12-05 Cristian SMINCHISESCU Image processing method, system and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHENYU XIE; ZAIYU HUANG; FUWEI ZHAO; HAOYE DONG; MICHAEL KAMPFFMEYER; XIAODAN LIANG: "Towards Scalable Unpaired Virtual Try-On via Patch-Routed Spatially-Adaptive GAN", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 20 November 2021 (2021-11-20), 201 Olin Library Cornell University Ithaca, NY 14853, XP091100429 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117635763A (zh) * 2023-11-28 2024-03-01 广州像素数据技术股份有限公司 基于人像部件分析的自动换装方法、装置、设备及介质

Also Published As

Publication number Publication date
CN116309005A (zh) 2023-06-23

Similar Documents

Publication Publication Date Title
JP7373554B2 (ja) クロスドメイン画像変換
EP3972239A1 (en) Method and apparatus for virtual fitting
KR20210094451A (ko) 이미지 생성 방법 및 장치
CN112967212A (zh) 一种虚拟人物的合成方法、装置、设备及存储介质
JP7361060B2 (ja) 3d関節点回帰モデル生成方法及び装置、電子機器、コンピュータ可読記憶媒体並びにコンピュータプログラム
CN106462572A (zh) 用于分布式光学字符识别和分布式机器语言翻译的技术
WO2021258971A1 (zh) 虚拟换服饰的方法和装置、设备和介质
CN106415605A (zh) 用于分布式光学字符识别和分布式机器语言翻译的技术
CN109754464B (zh) 用于生成信息的方法和装置
WO2022233223A1 (zh) 图像拼接方法、装置、设备及介质
WO2023109666A1 (zh) 虚拟换装方法、装置、电子设备及可读介质
US20160086365A1 (en) Systems and methods for the conversion of images into personalized animations
WO2023184817A1 (zh) 图像处理方法、装置、计算机设备、计算机可读存储介质及计算机程序产品
CN111246196B (zh) 视频处理方法、装置、电子设备及计算机可读存储介质
JP2023526899A (ja) 画像修復モデルを生成するための方法、デバイス、媒体及びプログラム製品
US20140198177A1 (en) Realtime photo retouching of live video
WO2022171114A1 (zh) 图像处理方法、装置、设备及介质
CN117094362B (zh) 一种任务处理方法及相关装置
CN115731341A (zh) 三维人头重建方法、装置、设备及介质
Luvizon et al. Scene‐Aware 3D Multi‐Human Motion Capture from a Single Camera
CN113689372A (zh) 图像处理方法、设备、存储介质及程序产品
CN115775310A (zh) 数据处理方法、装置、电子设备和存储介质
WO2024041235A1 (zh) 图像处理方法、装置、设备、存储介质及程序产品
WO2023143118A1 (zh) 图像处理方法、装置、设备及介质
CN111368668A (zh) 三维手部识别方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22906414

Country of ref document: EP

Kind code of ref document: A1