CN117422851A - Virtual clothes changing method and device and electronic equipment - Google Patents

Virtual clothes changing method and device and electronic equipment Download PDF

Info

Publication number
CN117422851A
CN117422851A CN202311441036.9A CN202311441036A CN117422851A CN 117422851 A CN117422851 A CN 117422851A CN 202311441036 A CN202311441036 A CN 202311441036A CN 117422851 A CN117422851 A CN 117422851A
Authority
CN
China
Prior art keywords
image
feature
clothes
human body
segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311441036.9A
Other languages
Chinese (zh)
Inventor
罗丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Shangyan Network Technology Co ltd
Original Assignee
Guangzhou Shangyan Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Shangyan Network Technology Co ltd filed Critical Guangzhou Shangyan Network Technology Co ltd
Priority to CN202311441036.9A priority Critical patent/CN117422851A/en
Publication of CN117422851A publication Critical patent/CN117422851A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/094Adversarial learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2219/00Indexing scheme for manipulating 3D models or images for computer graphics
    • G06T2219/20Indexing scheme for editing of 3D models
    • G06T2219/2024Style variation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computer Hardware Design (AREA)
  • Architecture (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application provides a virtual clothes changing method, a device and electronic equipment thereof, wherein the method comprises the following steps: acquiring first characteristic data of a human body image of a target object, and acquiring second characteristic data of a selected clothes image; feature fusion is carried out on the human body image and the clothes image based on the first feature data and the second feature data, and a target segmentation feature and a target clothes light flow graph are obtained; generating a deformation image of the clothing image according to the target clothing light flow graph and the target segmentation feature, and acquiring a mask segmentation image of the clothing image according to the deformation image; and obtaining an initial clothing changing image according to the deformation image and the mask segmentation image, and performing image restoration on the clothing changing image based on a diffusion model to obtain a target clothing changing image after the clothing changing operation is performed on the target object. According to the method and the device, the missing part of the human body in the figure image, which is fused with the clothes during virtual dressing, can be repaired.

Description

Virtual clothes changing method and device and electronic equipment
Technical Field
The application belongs to the technical field of artificial intelligence, relates to an image processing technology, and particularly relates to a virtual clothes changing method, a device and electronic equipment thereof.
Background
In the related art, virtual reloading can be realized on characters in the acquired image through an image processing technology, so that the reloading requirement of a user on the characters in the image is met, for example, the user can realize online fitting or acquire character images of different wearing clothes through the technology. However, on the one hand, the human body in the character image used in the related virtual reloading technology is usually in a front upright posture, and the wearing effect of clothing with different deformation degrees cannot be presented to the user in various postures of the human body due to the single posture of the human body. On the other hand, when the human body posture in the figure image changes more, the clothes in the virtual clothes-changing image may not be properly deformed due to the insufficient algorithm in the image fusion, so that the clothes cannot be attached to the human body posture or the human body part is lost when the clothes are fused into the figure image.
Disclosure of Invention
The embodiment of the application provides a virtual clothes changing method, a device thereof and electronic equipment, which can solve the technical problem that the clothes cannot be attached to the human body posture or the human body part is lost when the clothes are fused into a figure image due to the fact that the clothes cannot be properly deformed.
A first aspect of an embodiment of the present application provides a virtual clothes changing method, including: acquiring first characteristic data of a human body image of a target object, and acquiring second characteristic data of a selected clothes image; performing feature fusion on the human body image and the clothes image based on the first feature data and the second feature data to obtain a target segmentation feature and a target clothes light flow graph; generating a deformation image of the clothing image according to the target clothing light flow graph and the target segmentation feature, and acquiring a mask segmentation image of the clothing image according to the deformation image; and obtaining an initial clothing changing image according to the deformation image and the mask segmentation image, and performing image restoration on the initial clothing changing image based on a diffusion model to obtain a target clothing changing image after the clothing changing operation is performed on the target object.
According to an embodiment of the present application, the acquiring first feature data of the human body image of the target object includes: image segmentation is carried out on the human body image to obtain a first segmentation image of a human body in the human body image, and a posture depth image of the human body in the human body image is obtained, wherein the human body comprises the human body of the target object; performing image fusion on the first segmentation map and the gesture depth map to obtain a first fusion image; performing multi-scale feature extraction on the first fusion image to obtain human body features with multiple resolutions; up-sampling the human body features with the minimum resolution in the human body features with the multiple resolutions to obtain initial segmentation features; and taking the human body characteristics with the multiple resolutions and the initial segmentation characteristics as the first characteristic data.
According to an embodiment of the present application, the acquiring second feature data of the selected garment image includes: image segmentation is carried out on the clothes image to obtain a second segmentation map of clothes in the clothes image; performing image fusion on the clothes image and the second segmentation map to obtain a second fusion image; performing multi-scale feature extraction on the second fusion image to obtain clothing features with multiple resolutions; obtaining an initial garment light flow graph based on the garment feature with the smallest resolution of the garment features with the plurality of resolutions; and using the garment features of the plurality of resolutions and the initial garment light flow map as the second feature data.
According to an embodiment of the present application, the feature fusion is performed on the human body image and the clothing image based on the first feature data and the second feature data to obtain a target segmentation feature and a target clothing light flow graph, including: performing iterative updating of preset times on the initial segmentation feature in the first feature data and the initial garment light flow graph in the second feature data to obtain the target garment light flow graph and the target segmentation feature, wherein each update in the preset times of iterative updating comprises: carrying out affine transformation on the clothes characteristic Eci in the second characteristic data according to a clothes light flow chart Ff (i-1), and obtaining a clothes characteristic Eci after affine transformation, wherein i represents the update times, when i=1, a light flow chart Ff0 represents the initial clothes light flow chart, and the resolution of the clothes characteristic Eci is smaller than that of the clothes characteristic Ec (i+1); fusing the human body feature Esi, the segmentation feature Fs (i-1) and the affine transformed clothing feature Eci in the first feature data to obtain a first multichannel image; wherein the human feature Esi has a resolution less than the human feature Es (i+1); when i=1, the segmentation feature Fs0 represents the initial segmentation feature; extracting segmentation features of the first multichannel image to obtain segmentation features Fsi; wherein the resolution of the segmentation feature Fsi is greater than the resolution of the segmentation feature Fs (i-1); extracting the clothing light flow characteristics of the first multi-channel image to obtain a clothing light flow graph Ffi; wherein the resolution of the garment-light flow sheet Ffi is greater than the resolution of the garment-light flow sheet Ff (i-1).
According to an embodiment of the present application, the affine transformation is performed on the garment feature Eci in the second feature data according to the garment light flow graph Ff (i-1) to obtain an affine transformed garment feature Eci, which includes: and according to the coordinates of each pixel point in the clothes light flow graph Ff (i-1) after up sampling, assigning the pixel value of each pixel point in the clothes feature Eci to the pixel point of the corresponding coordinates in the clothes light flow graph Ff (i-1) to obtain the affine transformed clothes feature Eci.
According to an embodiment of the present application, before affine transformation of the garment feature Eci in the second feature data according to the garment light flow graph Ff (i-1), the method further comprises: upsampling the garment light flow map Ff (i-1); before the segmentation feature extraction is performed on the first multi-channel image, the method further includes: upsampling the first multi-channel image.
According to an embodiment of the present application, the generating a deformation image of the clothing image according to the target clothing light-flow graph and the target segmentation feature, and obtaining a mask segmentation image of the clothing image according to the deformation image includes: affine transformation is carried out on the clothes image according to the target clothes light flow graph, and the deformation image is obtained; and multiplying the target segmentation feature and the deformed image pixel by pixel, and normalizing the image obtained by multiplying the target segmentation feature and the deformed image pixel by pixel to obtain the mask segmentation image.
According to an embodiment of the present application, the performing image restoration on the initial dressing change image based on the diffusion model includes: when the human body missing part exists in the initial clothing changing image, repairing the human body missing part in the initial clothing changing image by using the diffusion model.
A second aspect of the embodiments of the present application provides a virtual clothes changing device, the device including: the data acquisition module is used for acquiring first characteristic data of the human body image and second characteristic data of the clothes image; the feature fusion module is used for carrying out feature fusion on the human body image and the clothes image based on the first feature data and the second feature data to obtain a target clothes light flow graph and a target segmentation feature; the image generation module is used for generating a deformation image of the clothes image according to the target clothes light flow graph and the target segmentation feature, and acquiring a mask segmentation image of the clothes image according to the deformation image; and the image restoration module is used for obtaining an initial dressing image according to the deformation image and the mask segmentation image, and carrying out image restoration on the initial dressing image based on a diffusion model to obtain a target dressing image.
A third aspect of the embodiments of the present application provides an electronic device, including: the virtual clothes changing device comprises a memory and a processor, wherein the processor executes computer readable instructions stored in the memory to realize the virtual clothes changing method.
The virtual clothes changing method provided by the embodiment of the application can divide the virtual clothes changing process into two stages: a garment deformation stage and a dressing image generation stage. In the deformation stage of the clothes, the clothes are deformed according to the posture of the human body by using a multi-scale characteristic fusion method, so that the clothes can be put on the human body, the deformation result is predicted under different resolutions, and the clothes are gradually thinned from low resolution to high resolution from bottom to top, so that a better deformation effect is achieved. In the dressing image generation stage, the deformed clothes are put on the human body based on the diffusion model method, and the places where the human body is lost due to factors such as the original clothes shielding area are repaired, so that the unaligned positions of the clothes and commodity clothes on the human body can be repaired better, and a better virtual dressing effect is realized.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is an application environment schematic diagram of a virtual clothes changing method according to an embodiment of the present application.
Fig. 2 is a schematic flow chart of a virtual clothes changing method according to an embodiment of the present application.
Fig. 3 is a flowchart of a method for acquiring first feature data according to an embodiment of the present application.
Fig. 4 is a flowchart of a method for acquiring second feature data according to an embodiment of the present application.
Fig. 5 is a flowchart of a method for each update in a preset number of iterative updates according to an embodiment of the present application.
Fig. 6 is an exemplary diagram of a feature fusion module provided in an embodiment of the present application.
Fig. 7 is an exemplary diagram of a mask-segmented image generation flow provided in an embodiment of the present application.
Fig. 8 is an exemplary diagram of a generation flow of a target dressing change image according to an embodiment of the present application.
Fig. 9 is a schematic block diagram of a virtual clothes changing device according to an embodiment of the present application.
Fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in detail below with reference to the accompanying drawings and specific embodiments.
It should be noted that "at least one" in this application means one or more, and "a plurality" means two or more. "and/or", describes an association relationship of an association object, and the representation may have three relationships, for example, a and/or B may represent: a alone, a and B together, and B alone, wherein a, B may be singular or plural. The terms "first," "second," "third," "fourth" and the like in the description and in the claims and drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
In the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as examples, illustrations, or descriptions. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion. The following embodiments and features of the embodiments may be combined with each other without conflict.
Fig. 1 is a schematic application environment diagram of a virtual clothes changing method according to an embodiment of the present application. As shown in fig. 1, a user terminal 10 communicates with a server 20 through a network. The network may be a wired network communication or a wireless network communication. The wired network may be any one of a local area network, a metropolitan area network, and a wide area network, a wireless fidelity (Wireless Fidelity, wi-Fi), an ad hoc network wireless communication (ZigBee Wireless Networks, zigBee) technology, an Ultra Wideband (UWB) technology, a wireless universal serial bus (Universal Serial Bus, USB), and the like.
The user terminal 10 may be an electronic device such as a mobile phone, tablet computer, multimedia playing device, personal computer (Personal Computer, PC), wearable device, etc. The user terminal 10 may be a client in which various kinds of applications are installed, for example, an education-type application, a consultation-type application, an information broadcasting application, a live broadcast-type application, etc., an electronic commerce-type application, etc.
The server 20 is used to provide background services for applications in the user terminal 10. For example, the server 20 may be a background server of the above-described e-commerce type application. The server 20 may be a computer device, in an embodiment of the present application, may be a server, a server cluster formed by a plurality of servers, or a cloud computing service center.
Illustratively, an e-commerce type application is illustrated. The user performs an operation of determining a clothing image and a human body image in a client (e.g., the user terminal 10) in which the electronic commerce type application is installed, and the user terminal 10 transmits the clothing image and the human body image to the server 20; the server 20 performs operations such as feature extraction and semantic segmentation on the clothing image and the human body image respectively, and generates a light flow chart of the clothing based on the operations such as fusion of the features of the clothing image and the human body image; the server 20 then generates a deformed image of the clothing image according to the optical flow diagram of the clothing, so that the deformed image can be accurately fused with the human body of the target object in the human body image to obtain a clothing changing image; finally, the server 20 determines whether there is a problem of missing human body parts or the like in the changing image, and performs image restoration on the problem parts in the changing image based on the diffusion model. In this way, the server 20 can generate a clothes changing image according to the clothes image and the human body image determined by the user, so that clothes in the clothes changing image can be accurately attached to the human body posture, and the human body part loss in the clothes changing image is avoided, thereby providing better virtual clothes changing experience for the user and facilitating the user to put on-line fitting.
The computer program product will be described below as being run on a server, such as the server 20 shown in fig. 1. Referring to fig. 2, a flow chart of a virtual clothes changing method provided in an embodiment of the present application is shown, and in an embodiment of the present application, the method includes the following steps:
step S201, acquiring first feature data of a human body image of a target object, and acquiring second feature data of a selected clothing image.
In some embodiments of the present application, the target object includes any object that the user determines to perform virtual dressing, for example, the target object may be a model of a business, and the target object may also be a person for whom the user wants to perform virtual dressing, for example, the user himself, the family of the user, and so on.
In some embodiments of the present application, the human body image of the target object represents an image of a human body containing the target object, for example, when the user wants to make a virtual fit on a coat, the human body image of the target object represents an image of a human body of the upper body containing the target object; when the user wants to make a virtual fit is a pants, the human body image of the target object represents an image containing the lower body of the target object.
In some embodiments of the present application, when the target object is a business model, an image of the business model determined by the user in a database provided by the business may be taken as a human body image of the target object; when the target object is a person for whom the user wants to perform virtual dressing, the image of the person for which the user wants to perform virtual dressing uploaded by the user may be taken as the human body image of the target object. The human body image of the target object can also be an image frame obtained by cutting out from the video of the target object.
In some embodiments of the present application, the human body in the human body image of the target object may contain a variety of poses, for example, natural poses of the human body with both hands naturally drooping, with one hand in the waist, with both hands in the waist, with the waist stretched out, etc. For example, for each business model, the database of the business includes body images of the various body poses of the model.
In some embodiments of the present application, when the target object is a business model, each human body image may further include human body data of the model, for example, information such as height and weight of the model may be included, so that the user may select a human body image of the target object corresponding to human body data similar to the human body data of the user, and better virtual clothing experience is provided for the user.
In some embodiments of the present application, the garment image includes an image of a garment selected by the user to be virtually fit, wherein each garment image has garment data of the garment therein, such as a size of the garment, etc., to facilitate the user's selection of the desired garment. The garments in the garment image may include various types, such as, for example, a jacket or short sleeve that contains sleeves, a sleeveless jacket, pants, shorts, skirts, and the like.
In some embodiments of the present application, the virtual changing of the garment based on the human body image and the garment image may be divided into two steps, wherein one step is to generate a deformed image of the garment in a corresponding form according to the human body posture in the human body image, and the other step is to fuse the deformed image onto the human body, so that each part of the garment in the deformed image can be accurately attached to the corresponding part of the human body. It can be seen that the virtual dressing change is achieved by fusing the features of the human body image and the garment image, so that the features of the human body image and the garment image are firstly obtained respectively, wherein the first feature data of the human body image can be extracted by using a preset first encoder for extracting the features of the human body image, and the second feature data of the garment image can be extracted by using a preset second encoder for extracting the features of the garment image.
In some embodiments of the present application, when the first feature data of the human body image of the target object is acquired, the human body feature data may be provided for the virtual changing by acquiring a pose depth map including human body pose information and a human body segmentation map including regions of each part of the human body, and then fusing the pose depth map and the human body segmentation map to extract the feature data of the human body fused image, thereby obtaining the first feature data including the human body pose information and the information of the regions of each part of the human body. Specifically, the first feature data includes a plurality of resolution human body features and an initial segmentation feature, and the initial segmentation feature includes a feature obtained by upsampling a human body feature with a minimum resolution from the plurality of resolution human body features. The acquisition of the first feature data of the human body image of the target object may also refer to the embodiment as shown in fig. 3.
In some embodiments of the present application, when the second feature data of the selected garment image is acquired, the feature data of the garment fusion image may be extracted by acquiring a segmentation image of each portion of the garment, and then fusing the segmentation image of the garment with the garment image, so as to obtain feature data including information of each portion of the garment; the initial light flow map of the garment can also be acquired according to the characteristic data of the garment fusion image, so that a reference is provided for acquiring the deformation image of the garment. Specifically, the second feature data includes a plurality of resolution garment features and an initial garment light flow map including an image obtained by extracting light flow features of a garment feature having a minimum resolution from among the plurality of resolution garment features. The second characteristic data of the selected garment image may be acquired with reference to the embodiment shown in fig. 4.
Step S202, performing feature fusion on the human body image and the garment image based on the first feature data and the second feature data, to obtain a target segmentation feature and a target garment light-flow graph.
In some embodiments of the present application, the feature fusion of the human body image and the clothing image based on the first feature data and the second feature data to obtain a target segmentation feature and a target clothing light flow graph includes: and carrying out iterative updating on the initial segmentation feature in the first feature data and the initial garment light flow graph in the second feature data for preset times to obtain the target garment light flow graph and the target segmentation feature.
In some embodiments of the present application, since the initial segmentation feature is obtained based on the least-resolution human feature and the initial light-flow map is obtained using the least-resolution clothing feature, the resolution of the initial segmentation feature and the initial light-flow map is low, and it is difficult to meet the usage requirement. The method can be used for carrying out iterative updating on the initial segmentation feature and the initial optical flow graph for a preset number of times in the feature fusion process so as to improve the resolutions of the segmentation feature and the optical flow graph, obtain a target clothes optical flow graph with higher resolution and the target segmentation feature, and provide a foundation for generating deformation images of clothes with higher resolution.
The preset times can be set according to actual needs, for example, the preset times can be set according to the number of human body features with multiple resolutions or the number of clothes features with multiple resolutions. In addition, the iterative updating can be stopped when the obtained updated clothes light flow graph and the resolution of the segmentation feature reach a preset threshold value.
In some embodiments of the present application, as shown in fig. 5, each of the preset number of iterative updates includes the following steps:
step S501, affine transformation is carried out on the clothes features Eci in the second feature data according to the clothes light flow graph Ff (i-1) to obtain affine transformed clothes features Eci.
In some embodiments of the present application, before affine transformation of the garment feature Eci in the second feature data according to the garment light flow map Ff (i-1), the method further comprises: upsampling the garment light flow map Ff (i-1); wherein i represents the number of updates, and when i=1, the optical flow diagram Ff0 represents the initial clothing optical flow diagram.
In some embodiments of the present application, since the size of the clothing image may be smaller than the size of the human body image, and thus the size of the clothing in the clothing image may be smaller than the size of the corresponding portion of the human body image where the clothing is to be changed, the clothing light-flow sheet Ff (i-1) may be up-sampled so that the size of the clothing light-flow sheet Ff (i-1) becomes larger.
In some embodiments of the present application, the affine transformation is performed on the clothing feature Eci in the second feature data according to the clothing light-flow graph Ff (i-1) to obtain an affine transformed clothing feature Eci, which includes: and according to the coordinates of each pixel point in the clothes light flow graph Ff (i-1) after up sampling, assigning the pixel value of each pixel point in the clothes feature Eci to the pixel point of the corresponding coordinates in the clothes light flow graph Ff (i-1) to obtain the affine transformed clothes feature Eci. Wherein the resolution of the clothing feature Ec is less than the clothing feature Ec (i+1).
Specifically, the second feature data includes a plurality of resolution garment features Ec, where the resolution of the ith garment feature Ec is less than the (i+1) th garment feature Ec (i+1), and i represents a positive integer greater than or equal to 1; if the plurality of resolution garment features includes n garment features in total, (i+1) is less than or equal to n. For example, when n equals 4, the clothing feature includes Ec1 at a resolution of 16 x 16, ec2 at a resolution of 32 x 32, ec3 at a resolution of 64 x 64, and Ec4 at a resolution of 128 x 128.
In some embodiments of the present application, the affine transformation described above may transform the garment feature Eci deformation to a morphology corresponding to the garment light flow sheet Ff (i-1), thereby facilitating the generation of a deformation image of the garment in a subsequent step. Specifically, the affine transformation may use a torch.grid_sample () function to assign pixels of corresponding coordinate points in the clothing light-flow graph Ff (i-1) to corresponding positions of the clothing light-flow graph Ff (i-1) according to coordinates of each point in the clothing light-flow graph Ff (i-1), where coordinates of each point in the clothing light-flow graph Ff (i-1) may not exactly correspond to the clothing feature ec, the pixels may be acquired in the clothing feature ec in a nearest-neighbor or bilinear manner, and finally, the final output is an image after affine transformation of the clothing light-flow graph Ff (i-1), where the image may have an alignment characteristic.
Step S502, fusing the human body feature Esi, the segmentation feature Fs (i-1) and the affine transformed clothing feature Eci in the first feature data to obtain a first multi-channel image.
In some embodiments of the present application, the human feature Esi has a resolution less than the human feature Es (i+1); when i=1, the segmentation feature Fs0 represents the initial segmentation feature. Specifically, the first feature data includes a plurality of resolution human features Esi, where the resolution of the ith human feature Esi is smaller than the (i+1) th human feature Esi (i+1), i representing a positive integer greater than or equal to 1; if the plurality of resolution human body features includes n garment features in total, (i+1) is less than or equal to n.
For example, when n is equal to 4, the human body features include Es1 with a resolution of 16×16, es2 with a resolution of 32×32, es3 with a resolution of 64×64, and Es4 with a resolution of 128×128. The segmentation feature Fs0 represents an initial segmentation feature obtained by upsampling the human feature Es 1.
In some embodiments of the present application, in order to fuse the feature information in the human body image and the garment image, the human body feature Esi, the segmentation feature Fs (i-1) and the affine transformed garment feature Eci are fused to obtain a first multi-channel image, where the method used in the fusion includes multi-channel merging, and the obtained first multi-channel image can include feature information of multiple dimensions corresponding to multiple channels. Wherein the feature information in the first multi-channel image includes: distribution information of each part of the human body corresponding to the human body characteristics in the human body image, human body posture information in the human body characteristics, distribution information of each part of the clothes corresponding to the clothes characteristics in the clothes image, and deformation degree information of the clothes in the clothes characteristics after affine transformation.
Step S503, extracting segmentation features from the first multi-channel image to obtain segmentation features Fsi.
In some embodiments of the present application, to improve the resolution of the first multi-channel image, before the segmentation feature extraction is performed on the first multi-channel image, the method further includes: upsampling the first multi-channel image.
In some embodiments of the present application, the resolution of the segmentation feature Fsi is greater than the resolution of the segmentation feature Fs (i-1) due to the up-sampling of the first multi-channel image prior to the segmentation feature extraction of the first multi-channel image.
In some embodiments of the present application, the segmentation feature extraction described above may be implemented using a convolution layer of 3*3, through which a segmentation feature with higher resolution can be obtained.
Step S504, performing garment optical flow feature extraction on the first multi-channel image, to obtain a garment optical flow graph Ffi.
In some embodiments of the present application, the garment light flow feature extraction may be performed using the up-sampled first multi-channel image, so the resolution of the garment light flow map Ffi is greater than the resolution of the garment light flow map Ff (i-1). The human body image can be an image frame obtained by capturing in a video, so that a human body optical flow image of the human body image can be obtained, and a reference is provided for determining an optical flow vector of the clothes when the optical flow characteristics of the clothes are extracted, so that the deformation degree of the clothes in the clothes optical flow graph Ffi becomes closer to the human body posture after each update.
In some embodiments of the present application, the above-described garment light flow feature extraction may be implemented using a convolution layer of 3*3, through which a garment light flow map with higher resolution can be obtained.
In yet another embodiment of the present application, as shown in fig. 6, the preset feature fusion module may be used to implement each update, specifically, the input of each update by the feature fusion module is that the garment light flow graph Ff (i-1), the garment feature Eci, the human feature Esi, and the segmentation feature Fs (i-1), and the garment light flow graph Ff (i-1) is first up-sampled (upscaled); then carrying out affine transformation on the clothes features Eci in the second feature data according to the clothes light flow graph Ff (i-1) to obtain affine transformed clothes features Eci; then fusing the human body feature Esi, the segmentation feature Fs (i-1) and the affine transformed clothing feature Eci in the first feature data to obtain a first multichannel image; the segmentation feature Fsi of the up-sampled first multi-channel image is then extracted using the convolution layer of 3*3, and the garment light map Ffi of the first multi-channel image is extracted using the convolution layer of 3*3.
In one example of the present application, the method of acquiring the target segmentation feature and the target garment light flow map may further comprise: the method comprises the steps of obtaining a clothing light flow graph Ff0, a segmentation feature Fs0, a clothing feature Ec1 and a human body feature Es1 with 32 x 32 resolution through a feature fusion module (for example, shown in fig. 6) to obtain the clothing light flow graph Ff1 with 64 x 64 resolution and the segmentation feature Fs1 with 64 x 32 resolution; the method comprises the steps that a clothing light flow graph Ff1, segmentation features Fs1, clothing features Ec2 and human body features Es2 are subjected to a feature fusion module to obtain a clothing light flow graph Ff2 with 64 x 64 resolution and a segmentation feature Fs2 with 128 x 128 resolution; the method comprises the steps of obtaining a clothing light flow graph Ff2, segmentation features Fs2, clothing features Ec3 and human body features Es3 with 128 x 128 resolution through a feature fusion module, and obtaining a clothing light flow graph Ff3 with 256 x 256 resolution; the method comprises the steps of obtaining a clothes light flow graph Ff3, a segmentation feature Fs3, a clothes feature Ec4 and a human body feature Es4 with 256 x 256 resolutions through a feature fusion module, obtaining a clothes light flow graph Ff4 with 512 x 512 resolutions, taking the clothes light flow graph Ff4 as a target clothes light flow graph, and taking the segmentation feature Fs4 as a target segmentation feature.
Step S203, generating a deformation image of the clothing image according to the target clothing light-flow graph and the target segmentation feature, and acquiring a mask segmentation image of the clothing image according to the deformation image.
In some embodiments of the present application, the generating a deformation image of the clothing image according to the target clothing light-flow graph and the target segmentation feature, and obtaining a mask segmentation image of the clothing image according to the deformation image includes: affine transformation is carried out on the clothes image according to the target clothes light flow graph, and the deformation image is obtained; and multiplying the target segmentation feature and the deformed image pixel by pixel, and normalizing the image obtained by multiplying the target segmentation feature and the deformed image pixel by pixel to obtain the mask segmentation image.
In some embodiments of the present application, such as shown in fig. 7, affine transforming (warp) the garment image according to the target garment-light-flow map comprises: and assigning the pixel value of the corresponding coordinate point in the clothes image to the corresponding position in the target clothes light flow graph according to the coordinate of each pixel point in the target clothes light flow graph (such as the clothes light flow graph Ff4 shown in fig. 7). The deformation state of the clothes in the deformation image obtained by affine transformation corresponds to the human body posture of the human body in the human body image, and the clothes can be accurately fused with the human body image to obtain a virtual clothes changing image.
In some embodiments of the present application, for example, as shown in fig. 7, the target segmentation feature (for example, the segmentation feature Fs4 shown in fig. 7) is multiplied by the deformed image by pixels (multi), and the image obtained by multiplying by pixels is normalized (softmax), so as to obtain the mask segmentation image, where the mask segmentation image includes a segmentation area corresponding to each part of the human body, and a human body part corresponding to the garment in the deformed image determined according to the segmentation area corresponding to each part of the human body.
Step S204, an initial clothing changing image is obtained according to the deformation image and the mask segmentation image, and image restoration is carried out on the initial clothing changing image based on a diffusion model, so that a target clothing changing image after clothing changing operation is carried out on the target object is obtained.
In some embodiments of the present application, when the initial clothing change image is obtained according to the deformation image and the mask segmentation image, since the sizes and the resolutions of the deformation image and the mask segmentation image are consistent, the clothing in the deformation image may be directly fused to the human body part corresponding to the clothing in the mask segmentation image, so that the human body part corresponding to the deformation image in the mask segmentation image is updated to the image of the clothing in the deformation image.
The image restoration of the initial dressing change image based on the diffusion model comprises the following steps: when the human body missing part exists in the initial clothing changing image, repairing the human body missing part in the initial clothing changing image by using the diffusion model.
In some embodiments of the present application, because the situation that the segmentation of the human body part is inaccurate may occur in the mask segmentation image, the human body part other than the human body part corresponding to the deformed garment is regarded as the part to which the deformed garment is fused, so that the phenomenon that the human body part is missing occurs in the initial garment replacement image. For example, as shown in fig. 8, the deformation clothes is a short sleeve, and the corresponding human body part is an upper body which does not contain an arm part; however, the arm part is also mistakenly considered to be the human body part corresponding to the deformed clothes in the mask segmentation image, so that the phenomenon that the arm part is missing or the gray level image appears in the initial clothes change image. Or when the human body in the human body image originally wears the long sleeves, the long sleeve covered parts in the mask segmentation image are gray images, and then the deformation image of the short sleeve is fused into the mask segmentation image, so that the phenomenon that the arm part or the gray image appears in the initial clothes-changing image and the arm part is missing appears.
In some embodiments of the present application, the diffusion model may be used to repair the missing part of the human body in the initial dressing image by receiving the determination result of whether the human body part in the initial dressing image input by the user is true. Specifically, the diffusion model can fill pixels of the missing part of the human body based on the sampling process of the diffusion model, so that the missing part of the human body in the initial dressing change image is repaired.
In other embodiments of the present application, the image restoration may be performed using a deep learning model for discriminating and restoring a missing part of a human body in an image. The training process of the model may include: and (3) data collection: collecting an image dataset with labeled human body parts missing, for example, the label can be a binary mask, a rectangular bounding box or any other form for indicating the human body parts missing in the image; data preprocessing: preprocessing images in an image dataset, including operations such as image scaling, cropping, rotation and the like, to increase data diversity and ensure consistency and comparability of input images; and (3) constructing a model: constructing an initial model, such as U-Net, mask R-CNN, etc., using a convolutional neural network or a generative countermeasure network; model training: training an initial model using a training dataset of the image dataset, updating the initial model by minimizing differences between the generated image and the real image by an optimizer and a loss function, which may include mean square error, perceived loss, etc.; verification and tuning: evaluating the performance of the updated initial model by using a verification data set in the image data set, and optimizing according to the result, for example, different modes such as super parameter setting, network architecture modification and the like can be tried to improve the model performance; model testing and deployment: the accuracy and robustness of the updated initial model is evaluated using a test dataset of the image dataset, and if the updated initial model reaches an expected performance level, the updated initial model is deemed to be available for the image restoration.
The method provided by the embodiment of the application can divide the virtual clothes change process into two stages: a garment deformation stage and a dressing image generation stage. In the deformation stage of the clothes, in order to deform the clothes according to the posture of a human body, the clothes can be put on the human body, a multi-scale feature fusion method is used, the deformation result is predicted under different resolutions, and the deformation result is gradually thinned from low resolution to high resolution from bottom to top, so that a better deformation effect is achieved. In the dressing image generation stage, the deformed clothes are put on the human body based on the diffusion model method, and the places where the human body is lost due to factors such as the original clothes shielding area are repaired, so that the unaligned positions of the clothes and commodity clothes on the human body can be repaired better, and a better virtual dressing effect is realized.
In some embodiments of the present application, as shown in fig. 3, acquiring first feature data of a human body image of a target object includes the following steps:
step S301, performing image segmentation on the human body image to obtain a first segmentation map of a human body in the human body image, and obtaining a pose depth map of the human body in the human body image.
In some embodiments of the present application, the human body includes a human body of the target object, and the method for performing image segmentation on the human body image includes, but is not limited to, one or more of the following: human body part segmentation algorithm based on semantic segmentation and human body part segmentation algorithm based on key point detection. The first segmentation map obtained by segmenting the human body image may include an area where each human body part is located in the human body image, so as to provide a basis for fusing the image of each part of the garment to an area of a corresponding part of the human body, for example, an area where an arm is located in the human body image is obtained by image segmentation, and the area where the arm is located may be used as an area where the sleeve is to be fused in the garment including the sleeve.
In some embodiments of the present application, a depth sensor or a binocular camera may be used when acquiring a pose depth map of a human body in a human body image, and a corresponding calculation method is combined: the depth sensor can be used for acquiring depth information of a scene and aligning the depth information with a human body image to obtain a posture depth map of the human body, wherein the depth sensor can comprise a sensor based on structured light (such as Kinect v 2) or a sensor based on Time-of-Flight (ToF) principle and the like; when the binocular camera is used, human body images of left and right visual angles can be obtained through the binocular camera, parallax information between the human body images of the two visual angles is calculated, and then the parallax information is converted into a posture depth map of a human body through a triangulation principle.
In some embodiments of the present application, by acquiring the posture depth map of the human body, the human body posture can be better determined, so as to provide a reference basis for subsequently acquiring the optical flow map of the garment, that is, the deformation image of the garment, so that the deformation image of the garment can be correspondingly fused with the human body posture, for example, if the human body posture is determined to be a chest holding state by both hands through the posture depth map of the human body, the region, to be covered by the preceding two arms, of the short-sleeved garment should be scratched out in the deformation image of the short-sleeved garment.
Step S302, performing image fusion on the first segmentation map and the gesture depth map to obtain a first fusion image.
In some embodiments of the present application, the image fusion may be a multi-channel fusion operation (concat).
Step S303, performing multi-scale feature extraction on the first fused image to obtain human body features with multiple resolutions.
In some embodiments of the present application, methods used in performing multi-scale feature extraction include, but are not limited to, combinations of one or more of the following: image pyramid: scaling the first fused image in different scales to obtain an image pyramid, and extracting features in different scales from each layer of pictures in the image pyramid to obtain a feature map; multi-scale convolution layer: and designing target detectors with different scales for different output layers, and finishing the feature extraction of the first fused image under multiple scales.
In some embodiments of the present application, since the above-mentioned multiple-resolution human body features are obtained by performing multi-scale feature extraction on the fused image obtained by fusing the segmentation map and the gesture depth map of the human body image, the multiple-resolution human body features not only can provide a basis for fusing the image of each part of the garment to the region corresponding to the human body, but also can provide a reference basis for acquiring the optical flow map of the garment, that is, the deformation image of the garment, so that the deformation image of the garment can be fused together with the human body gesture.
Step S304, upsampling the human body feature with the smallest resolution in the human body features with the multiple resolutions to obtain an initial segmentation feature.
In some embodiments of the present application, by performing scale-up on the human body feature with the smallest resolution among the human body features with the resolutions, the obtained initial segmentation feature corresponding to the human body may be consistent with the size of the human body image, and the initial segmentation feature includes the human body region segmentation information and the human body posture information, which may provide a better reference for the subsequent clothing changing process and the acquisition process of the optical flow diagram of the clothing.
And step S305, taking the human body characteristics with the multiple resolutions and the initial segmentation characteristics as the first characteristic data.
In some embodiments of the present application, as shown in fig. 4, acquiring second feature data of a selected garment image includes the steps of:
and S401, performing image segmentation on the clothes image to obtain a second segmentation map of clothes in the clothes image.
In some embodiments of the present application, the image of the garment is segmented to obtain an area where each part of the garment is located, so as to provide a basis for fusing the image of each part of the garment to an area corresponding to a human body, for example, the image segmentation is used to obtain an area where the sleeves of the garment are located in the image of the garment, and the image of the area where the sleeves of the garment are located can be fused to the area where the arms of the human body are located.
And step S402, performing image fusion on the clothes image and the second segmentation map to obtain a second fusion image.
In some embodiments of the present application, by fusing the segmented image of the garment image with the garment image, the obtained fused image not only includes information of each region of the garment but also includes overall information of the garment, and according to the garment features of multiple resolutions obtained by multi-scale feature extraction of the fused image, a basis can be provided for obtaining a deformation image of the garment later and fusing the deformation image with a corresponding portion of a human body according to the garment.
Step S403, performing multi-scale feature extraction on the second fused image to obtain multiple resolution garment features.
Step S404, obtaining an initial clothes light flow graph based on the clothes feature with the minimum resolution in the clothes features with the multiple resolutions.
In some embodiments of the present application, each pixel point in the garment feature with the smallest resolution may be transformed according to the preset direction and intensity of the initial optical flow vector, to obtain an initial garment optical flow map. The initial optical flow vector can be initialized based on prior information, the deformation state of the initial clothing optical flow sheet obtained at this time has a larger difference from the human body posture, and the image fused with the clothing feature information and the human body feature information can be used for updating the initial clothing optical flow sheet for multiple times in the subsequent process (for example, step S504) so that the clothing optical flow sheet gradually approaches the human body posture.
In some embodiments of the present application, since the size of the garment feature with the smallest resolution among the plurality of garment features is consistent with the size of the garment image, the initial optical flow map of the garment obtained according to the optical flow feature of the garment feature with the smallest resolution is consistent with the size of the garment image, thereby more meeting the actual requirement when obtaining the deformation image of the garment according to the optical flow map.
Step S405, regarding the garment features with the multiple resolutions and the initial garment light flow map as the second feature data.
In some embodiments of the present application, a virtual dressing model may also be constructed and trained according to the methods of the above embodiments, and each network structure or network layer in the model is trained, thereby obtaining a virtual dressing model for implementing the methods of the above embodiments.
Referring to fig. 9, a schematic block diagram of a virtual clothes-changing device according to an embodiment of the present application is provided, and the virtual clothes-changing device is adapted to one of the purposes of the present application, and is a functional embodiment of a virtual clothes-changing method of the present application, where the virtual clothes-changing device includes: a data acquisition module 71 for acquiring first feature data of a human body image and second feature data of a clothing image; a feature fusion module 72, configured to perform feature fusion on the human body image and the garment image based on the first feature data and the second feature data, so as to obtain a target garment light-flow graph and a target segmentation feature; an image generation module 73, configured to generate a deformed image of the clothing image according to the target clothing light-flow graph and the target segmentation feature, and acquire a mask segmentation image of the clothing image according to the deformed image; the image restoration module 74 is configured to obtain an initial dressing image according to the deformation image and the mask segmentation image, and perform image restoration on the initial dressing image based on a diffusion model to obtain a target dressing image.
Another embodiment of the present application also provides an electronic device. The application environment of fig. 1 is only given as an example, and in other exemplary embodiments, a computer program product implementing the virtual clothes change method of the embodiments of the present application may also be run in any electronic device (such as the electronic device shown in fig. 10) with sufficient computing power to perform the steps of the virtual clothes change method, thereby providing the virtual clothes change function.
Fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 10, in an embodiment of the present application, the electronic device 800 may be a mobile phone, a tablet computer, a smart wearable device, an augmented Reality (Augmented Reality, AR)/Virtual Reality (VR) device, a notebook computer, a netbook, or the like, and the embodiment of the present application does not limit the specific type of the electronic device 800.
As shown in fig. 10, the electronic device 800 may include, but is not limited to, a communication module 81, a memory 82, a processor 83, an Input/Output (I/O) interface 84, and a bus 85. The processor 83 is coupled to the communication module 81, the memory 82, and the I/O interface 84, respectively, via a bus 85.
It will be appreciated by those skilled in the art that the schematic diagram is merely an example of the electronic device 800 and does not constitute a limitation of the electronic device 800, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the electronic device 800 may also include a network access device, etc.
The communication module 81 may include a wired communication module and/or a wireless communication module. The wired communication module may provide one or more of a universal serial bus (Universal Serial Bus, USB), controller area network bus (CAN, controller Area Network), etc. wired communication solution. The wireless communication module may provide one or more of wireless communication solutions such as wireless fidelity (Wireless Fidelity, wi-Fi), bluetooth (BT), mobile communication networks, frequency modulation (Frequency Modulation, FM), near field wireless communication technology (near field communication, NFC), infrared (IR) technology, and the like.
The memory 82 may be used to store computer readable instructions and/or modules that the processor 83 performs various functions of the electronic device 800 by executing or executing the computer readable instructions and/or modules stored in the memory 82 and invoking data stored in the memory 82. The memory 82 may mainly include a storage program area that may store an operating system, application programs required for at least one function (such as a sound playing function, an image playing function, etc.), and a storage data area; the storage data area may store data created according to the use of the electronic device 800, etc. Memory 82 may include non-volatile and volatile memory, such as: a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other storage device.
The memory 82 may be an external memory and/or an internal memory of the electronic device 800. Further, the memory 82 may be a physical memory, such as a memory bank, a TF Card (Trans-flash Card), and the like.
The processor 83 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor, etc., and the processor 83 is an operation core and a control center of the electronic device 800, connects various parts of the entire electronic device 800 using various interfaces and lines, and executes an operating system of the electronic device 800 and various applications, program codes, etc. installed.
For example, computer readable instructions may be partitioned into one or more modules/sub-modules/units that are stored in memory 82 and executed by processor 83 to complete the present application. One or more of the modules/sub-modules/units may be a series of computer readable instructions capable of performing a particular function, the computer readable instructions describing a process of execution of the computer readable instructions in the electronic device 800. For example, the computer readable instructions may be partitioned into a data acquisition module 71, a feature fusion module 72, an image generation module 73, and an image restoration module 74.
The integrated modules/units of the electronic device 800 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as a stand alone product. Based on such understanding, the present application implements all or part of the flow of the method of the above-described embodiments, and may also be implemented by means of computer readable instructions, which may be stored in a computer readable storage medium, that when executed by a processor, implement the steps of the method embodiments described above.
The computer readable instructions include computer readable instruction code, which may be in the form of source code, object code, executable files, or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer readable instruction code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory).
In connection with fig. 2-7, the memory 82 in the electronic device 800 stores computer readable instructions and the processor 83 can execute the computer readable instructions stored in the memory 82 to implement the virtual clothes change method as shown in fig. 2-7.
In particular, the specific implementation method of the processor 83 to the computer readable instructions may refer to descriptions of related steps in the corresponding embodiments of fig. 2 to 7, which are not repeated herein.
The I/O interface 84 is used to provide a channel for user input or output, for example, the I/O interface 84 may be used to connect various input/output devices, such as a mouse, keyboard, touch device, display screen, etc., so that a user may enter information, or visualize information.
The bus 85 is at least used to provide a channel for communication between the communication module 81, the memory 82, the processor 83, and the I/O interface 84 in the electronic device 800.
In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of modules is merely a logical function division, and other manners of division may be implemented in practice.
The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. Also, the plurality of units or means of (a) may be implemented by one unit or means by software or hardware. The terms first, second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above embodiments are merely for illustrating the technical solution of the present application and not for limiting, and although the present application has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present application may be modified or substituted without departing from the spirit and scope of the technical solution of the present application.

Claims (10)

1. A virtual changing method, the method comprising:
acquiring first characteristic data of a human body image of a target object, and acquiring second characteristic data of a selected clothes image;
performing feature fusion on the human body image and the clothes image based on the first feature data and the second feature data to obtain a target segmentation feature and a target clothes light flow graph;
generating a deformation image of the clothing image according to the target clothing light flow graph and the target segmentation feature, and acquiring a mask segmentation image of the clothing image according to the deformation image;
and obtaining an initial clothing changing image according to the deformation image and the mask segmentation image, and performing image restoration on the initial clothing changing image based on a diffusion model to obtain a target clothing changing image after the clothing changing operation is performed on the target object.
2. The virtual clothes changing method according to claim 1, wherein the acquiring the first feature data of the human body image of the target object includes:
image segmentation is carried out on the human body image to obtain a first segmentation image of a human body in the human body image, and a posture depth image of the human body in the human body image is obtained, wherein the human body comprises the human body of the target object;
Performing image fusion on the first segmentation map and the gesture depth map to obtain a first fusion image;
performing multi-scale feature extraction on the first fusion image to obtain human body features with multiple resolutions;
up-sampling the human body features with the minimum resolution in the human body features with the multiple resolutions to obtain initial segmentation features;
and taking the human body characteristics with the multiple resolutions and the initial segmentation characteristics as the first characteristic data.
3. A virtual clothes change method according to claim 1, wherein the acquiring second feature data of the selected clothes image includes:
image segmentation is carried out on the clothes image to obtain a second segmentation map of clothes in the clothes image;
performing image fusion on the clothes image and the second segmentation map to obtain a second fusion image;
performing multi-scale feature extraction on the second fusion image to obtain clothing features with multiple resolutions;
obtaining an initial garment light flow graph based on the garment feature with the smallest resolution of the garment features with the plurality of resolutions;
and using the garment features of the plurality of resolutions and the initial garment light flow map as the second feature data.
4. The virtual clothes changing method according to claim 1, wherein the feature fusion of the human body image and the clothes image based on the first feature data and the second feature data to obtain a target segmentation feature and a target clothes light-flow graph comprises:
performing iterative updating of preset times on the initial segmentation feature in the first feature data and the initial garment light flow graph in the second feature data to obtain the target garment light flow graph and the target segmentation feature, wherein each update in the preset times of iterative updating comprises:
carrying out affine transformation on the clothes characteristic Eci in the second characteristic data according to a clothes light flow chart Ff (i-1), and obtaining a clothes characteristic Eci after affine transformation, wherein i represents the update times, when i=1, a light flow chart Ff0 represents the initial clothes light flow chart, and the resolution of the clothes characteristic Eci is smaller than that of the clothes characteristic Ec (i+1);
fusing the human body feature Esi, the segmentation feature Fs (i-1) and the affine transformed clothing feature Eci in the first feature data to obtain a first multichannel image; wherein the human feature Esi has a resolution less than the human feature Es (i+1); when i=1, the segmentation feature Fs0 represents the initial segmentation feature;
Extracting segmentation features of the first multichannel image to obtain segmentation features Fsi; wherein the resolution of the segmentation feature Fsi is greater than the resolution of the segmentation feature Fs (i-1);
extracting the clothing light flow characteristics of the first multi-channel image to obtain a clothing light flow graph Ffi; wherein the resolution of the garment-light flow sheet Ffi is greater than the resolution of the garment-light flow sheet Ff (i-1).
5. The virtual clothes changing method according to claim 4, wherein the affine transformation is performed on the clothes feature Eci in the second feature data according to the clothes light flow graph Ff (i-1) to obtain the affine transformed clothes feature Eci, and the method comprises:
and according to the coordinates of each pixel point in the clothes light flow graph Ff (i-1) after up sampling, assigning the pixel value of each pixel point in the clothes feature Eci to the pixel point of the corresponding coordinates in the clothes light flow graph Ff (i-1) to obtain the affine transformed clothes feature Eci.
6. The virtual changing method according to claim 4, wherein: before affine transformation of the garment feature Eci in the second feature data according to the garment light flow map Ff (i-1), the method further comprises: upsampling the garment light flow map Ff (i-1);
Before the segmentation feature extraction is performed on the first multi-channel image, the method further includes: upsampling the first multi-channel image.
7. The virtual clothes change method according to claim 1, wherein the generating a deformation image of the clothes image according to the target clothes light flow graph and the target segmentation feature, and acquiring a mask segmentation image of the clothes image according to the deformation image, comprises:
affine transformation is carried out on the clothes image according to the target clothes light flow graph, and the deformation image is obtained;
and multiplying the target segmentation feature and the deformed image pixel by pixel, and normalizing the image obtained by multiplying the target segmentation feature and the deformed image pixel by pixel to obtain the mask segmentation image.
8. The virtual changing method according to claim 1, wherein the performing image restoration on the initial changing image based on the diffusion model comprises:
when the human body missing part exists in the initial clothing changing image, repairing the human body missing part in the initial clothing changing image by using the diffusion model.
9. A virtual clothes change apparatus for implementing the virtual clothes change method according to any one of claims 1 to 8, the virtual clothes change apparatus comprising:
The data acquisition module is used for acquiring first characteristic data of the human body image and second characteristic data of the clothes image;
the feature fusion module is used for carrying out feature fusion on the human body image and the clothes image based on the first feature data and the second feature data to obtain a target clothes light flow graph and a target segmentation feature;
the image generation module is used for generating a deformation image of the clothes image according to the target clothes light flow graph and the target segmentation feature, and acquiring a mask segmentation image of the clothes image according to the deformation image;
and the image restoration module is used for obtaining an initial dressing image according to the deformation image and the mask segmentation image, and carrying out image restoration on the initial dressing image based on a diffusion model to obtain a target dressing image.
10. An electronic device, comprising:
memory device
A processor executing computer readable instructions stored in the memory, implementing the virtual changing method of any one of claims 1 to 8.
CN202311441036.9A 2023-10-31 2023-10-31 Virtual clothes changing method and device and electronic equipment Pending CN117422851A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311441036.9A CN117422851A (en) 2023-10-31 2023-10-31 Virtual clothes changing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311441036.9A CN117422851A (en) 2023-10-31 2023-10-31 Virtual clothes changing method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN117422851A true CN117422851A (en) 2024-01-19

Family

ID=89529936

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311441036.9A Pending CN117422851A (en) 2023-10-31 2023-10-31 Virtual clothes changing method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN117422851A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117745990A (en) * 2024-02-21 2024-03-22 虹软科技股份有限公司 Virtual fitting method, device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117745990A (en) * 2024-02-21 2024-03-22 虹软科技股份有限公司 Virtual fitting method, device and storage medium
CN117745990B (en) * 2024-02-21 2024-05-07 虹软科技股份有限公司 Virtual fitting method, device and storage medium

Similar Documents

Publication Publication Date Title
JP2022524891A (en) Image processing methods and equipment, electronic devices and computer programs
CN110648397B (en) Scene map generation method and device, storage medium and electronic equipment
CN110517214B (en) Method and apparatus for generating image
CN110490959B (en) Three-dimensional image processing method and device, virtual image generating method and electronic equipment
CN112308051B (en) Text box detection method and device, electronic equipment and computer storage medium
CN113724368A (en) Image acquisition system, three-dimensional reconstruction method, device, equipment and storage medium
CN117422851A (en) Virtual clothes changing method and device and electronic equipment
JP7282474B2 (en) Encryption mask determination method, encryption mask determination device, electronic device, storage medium, and computer program
JP2023540730A (en) Methods, devices, electronic devices, and readable storage media for constructing topographic maps
CN110969641A (en) Image processing method and device
CN117372604B (en) 3D face model generation method, device, equipment and readable storage medium
CN113766117B (en) Video de-jitter method and device
CN113658035A (en) Face transformation method, device, equipment, storage medium and product
CN111353325A (en) Key point detection model training method and device
CN113628349B (en) AR navigation method, device and readable storage medium based on scene content adaptation
CN115775300A (en) Reconstruction method of human body model, training method and device of human body reconstruction model
CN111652831B (en) Object fusion method and device, computer-readable storage medium and electronic equipment
CN113781653A (en) Object model generation method and device, electronic equipment and storage medium
CN114140320A (en) Image migration method and training method and device of image migration model
Rodrigues et al. AR contents superimposition on walls and persons
CN115953515B (en) Cartoon image generation method, device, equipment and medium based on real person data
JP7405448B2 (en) Image processing method and image processing system
CN114820908B (en) Virtual image generation method and device, electronic equipment and storage medium
CN116434316B (en) Identity recognition method, device, equipment and medium based on X86 industrial control main board
CN116962816B (en) Method and device for setting implantation identification, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination