CN113592971A - Virtual human body image generation method, system, equipment and medium - Google Patents

Virtual human body image generation method, system, equipment and medium Download PDF

Info

Publication number
CN113592971A
CN113592971A CN202110865481.2A CN202110865481A CN113592971A CN 113592971 A CN113592971 A CN 113592971A CN 202110865481 A CN202110865481 A CN 202110865481A CN 113592971 A CN113592971 A CN 113592971A
Authority
CN
China
Prior art keywords
human body
target
body image
posture
source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110865481.2A
Other languages
Chinese (zh)
Other versions
CN113592971B (en
Inventor
王乐
师皓玥
周三平
陈仕韬
辛景民
郑南宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningbo Shun'an Artificial Intelligence Research Institute
Xian Jiaotong University
Original Assignee
Ningbo Shun'an Artificial Intelligence Research Institute
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningbo Shun'an Artificial Intelligence Research Institute, Xian Jiaotong University filed Critical Ningbo Shun'an Artificial Intelligence Research Institute
Priority to CN202110865481.2A priority Critical patent/CN113592971B/en
Publication of CN113592971A publication Critical patent/CN113592971A/en
Application granted granted Critical
Publication of CN113592971B publication Critical patent/CN113592971B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a virtual human body image generation method, a system, equipment and a medium, wherein the method comprises the following steps: inputting the source human body image and the target posture image into a pre-trained virtual human body image generation network to obtain a target posture human body image; wherein, virtual human image generation network is convolution neural network, includes: the encoder is used for inputting a source human body image and a target posture image, and encoding to obtain a source human body characteristic and a target human body characteristic; the structure-based appearance generation module is used for inputting and updating the source human body characteristics and the target human body characteristics to obtain updated source human body characteristics and target human body characteristics; and the decoder is used for inputting the target human body characteristics output by the structure-based appearance generation module and decoding to obtain the target posture human body image. The invention utilizes the virtual human body image generation network under the posture guidance based on the human body structure to generate the vivid human body image with the correct target posture.

Description

Virtual human body image generation method, system, equipment and medium
Technical Field
The invention belongs to the technical field of computer vision and computer graphics intersection, and particularly relates to a virtual human body image generation method, a virtual human body image generation system, virtual human body image generation equipment and a virtual human body image generation medium.
Background
The virtual human body image task under the posture guidance aims at a source human body image and a given target posture image to generate a new human body image of a target posture; wherein the human body posture of the generated new human body image is consistent with the target posture, and the human body appearance is similar to the appearance of the source human body image. The task has many application scenes, such as data augmentation in film production, virtual reality, motion recognition tasks and the like.
At present, the virtual image generation method under the current stage of posture guidance mainly has the following defects:
generating a virtual human body image of a target posture, wherein the posture consistency and the appearance consistency of the generated image are considered at the same time; the target posture is usually greatly different from the posture in the source human body image and is very complex, the existing method usually considers how to deform the source human body image to obtain the human body image with the target posture, and the human body posture of the generated image is not clear or even inconsistent with the target posture because the effectiveness of deformation cannot be ensured, so that the quality of the generated human body image with the specified posture is very low.
In summary, there is a need for a new method, system, device and medium for generating virtual human body images under posture guidance based on human body structures.
Disclosure of Invention
The present invention is directed to a method, system, device and medium for generating a virtual human body image, so as to solve one or more of the above-mentioned problems. The invention utilizes the virtual human body image generation network under the posture guidance based on the human body structure to generate the vivid human body image with the correct target posture.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention discloses a virtual human body image generation method, which comprises the following steps:
inputting a source human body image and a target posture image into a pre-trained virtual human body image generation network, and outputting the virtual human body image generation network to obtain a target posture human body image;
wherein, the virtual human body image generation network is a convolution neural network, and comprises:
the encoder is used for inputting a source human body image and a target posture image, and encoding to obtain a source human body characteristic and a target human body characteristic;
the structure-based appearance generation module is used for inputting and updating the source human body characteristics and the target human body characteristics to obtain updated source human body characteristics and target human body characteristics;
and the decoder is used for inputting the target human body characteristics output by the structure-based appearance generation module and decoding to obtain a target posture human body image.
The invention has the further improvement that the step of acquiring the trained virtual human body image generation network specifically comprises the following steps:
acquiring a sample data set; each sample data in the sample data set comprises source human body image sample data, target human body image sample data, source human body posture sample data and target human body posture sample data;
inputting source human body image sample data, source human body posture sample data and target human body posture sample data in selected sample data of the sample data set into the virtual human body image generation network to obtain virtual target human body image data; constructing a loss function based on the virtual target human body image data and target human body image sample data in the selected sample data, and performing iterative optimization on the virtual human body image generation network;
and obtaining the trained virtual human body image generation network after reaching the preset iteration times or convergence conditions.
A further refinement of the invention is that the structure-based appearance generation module comprises:
the structure perception self-adaptive normalization module is used for inputting the source human body characteristics and the target human body characteristics, generating stylized target posture characteristics and outputting the stylized target posture characteristics;
and the characteristic enhancement module is used for inputting the generated stylized target posture characteristic and the source human body characteristic and outputting the updated source human body characteristic and the updated target human body characteristic.
In the sample data set, the step of obtaining the source human body posture sample data and the target human body posture sample data in each sample data comprises: carrying out attitude estimation on the human body image by adopting an openposition attitude estimation method to obtain 18 human body joint point coordinate sequences; wherein the source human body image
Figure BDA0003187127880000031
Is expressed as P (I)s)={p1,…,pKH, 18; target human body image
Figure BDA0003187127880000032
Is expressed as P (I)t)={p1,…,pKH, 18; based on the obtained coordinate sequence of the human body joint points, K heat maps are used for representing human body posture information; wherein the source human body posture information is expressed as
Figure BDA0003187127880000033
The target human posture information is expressed as
Figure BDA0003187127880000034
In the encoder, the step of encoding to obtain the source human body characteristic and the target human body characteristic specifically includes: targeting pose information P with 2 downsampled convolutional layerstEncoding as target human features Ct(ii) a Source human image I with 2 downsampled convolution layerssAnd attitude information PsEncoding as source human body characteristic Cs
A further improvement of the present invention is that, in the structure-based appearance generating module, the step of inputting and updating the source human body characteristics and the target human body characteristics to obtain updated source human body characteristics and target human body characteristics specifically includes:
dividing the human body image into a plurality of human body parts and 1 background part based on the obtained human body joint point coordinate sequence to obtain L masks of each part; wherein each partial mask of the source human body image is represented as
Figure BDA0003187127880000035
The partial mask of the target human body image is represented as
Figure BDA0003187127880000036
Targeting body features C with two convolutional layers+Carrying out convolution to obtain target human body characteristics
Figure BDA0003187127880000037
Targeting body features C with two convolutional layerssConvolving to obtain source human body characteristics
Figure BDA0003187127880000038
According to source human body characteristics FsAnd partial mask M of source human body imagesGenerating a style vector
Figure BDA0003187127880000039
Wherein for VstyFor each of the rows of the plurality of rows,
Figure BDA00031871278800000310
is a C-dimensional vector representing the characteristics of each part of the source human body image; obtaining style vector of 1 st part by mean pooling
Figure BDA00031871278800000311
Figure BDA00031871278800000312
Wherein Resize (·) represents a zoom operation;
Figure BDA00031871278800000313
representing element-by-element multiplication; pool (·) denotes pooling;
according to the corresponding relation of all parts of the source human body image and the target human body image, the style vector V is converted into a style vector VstyPartial mask M inserted into target human body imagetObtaining style matrix T in corresponding partssty(ii) a Wherein the 1 st style vector is used
Figure BDA00031871278800000314
Inserting the 1 st mask of the target human body image into the 1 st style matrix in a broadcasting way to generate a 1 st style matrix
Figure BDA00031871278800000315
All L style matrices
Figure BDA00031871278800000316
The element-by-element addition of 0, …, L results in the final style matrix Tsty
Figure BDA0003187127880000041
Using two convolutional layers to form a style matrix TstyConvolution is carried out to obtain modulation parameters in normalization operation
Figure BDA0003187127880000042
And
Figure BDA0003187127880000043
to the target attitude feature F+Go on to return in batchesIs subjected to a normalization treatment to obtain
Figure BDA0003187127880000044
Using gamma and beta to FnormModulating to obtain stylized target attitude characteristic Fsty
Fsty=γFnorm+β;
To target characteristic attitude FstyAnd source human features FsSplicing and fusing, and then enhancing the fused features by using a Squeeze-and-Excitation operation to obtain enhanced features Ffuse
Obtaining updated target human body characteristic C't
Figure BDA0003187127880000045
Wherein,
Figure BDA0003187127880000046
represents a feature fusion operation, pair Ct,Ft,FsFusing in a splicing and adding mode;
updated source human body characteristic C'sIs a source human body characteristic FsAnd updated target body characteristic C'tAnd (4) splicing.
A further refinement of the invention provides that the loss function comprises: an antagonistic loss function, a perceptual loss function, and a loss function based on similarity of human structures.
A further improvement of the present invention is that the structure-based appearance generation module is replaced with an integrated appearance generation module;
the integrated appearance generating module is composed of a plurality of the structure-based appearance generating modules in a cascade.
The invention relates to a virtual human body image generation system, which comprises:
the image generation module is used for inputting the source human body image and the target posture image into a pre-trained virtual human body image generation network and outputting the virtual human body image generation network to obtain a target posture human body image;
wherein, the virtual human body image generation network is a convolution neural network, and comprises:
the encoder is used for inputting a source human body image and a target posture image, and encoding to obtain a source human body characteristic and a target human body characteristic;
the structure-based appearance generation module is used for inputting and updating the source human body characteristics and the target human body characteristics to obtain updated source human body characteristics and target human body characteristics;
and the decoder is used for inputting the target human body characteristics output by the structure-based appearance generation module and decoding to obtain a target posture human body image.
An electronic device of the present invention includes a processor and a memory, the processor is configured to execute a computer program stored in the memory to implement the virtual human body image generation method according to any one of the above aspects of the present invention.
A computer-readable storage medium of the present invention stores at least one instruction, which when executed by a processor, implements a virtual human body image generation method as any one of the above.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides a novel method for generating a virtual human body image under posture guidance based on a human body structure, which can generate a vivid human body image with a correct target posture. Specifically, aiming at the technical problems that the effectiveness of deformation cannot be guaranteed, the quality of the generated human body image with the specified posture is low, a fuzzy human body posture is easy to generate, and even the consistency of the posture cannot be maintained in the existing method, the invention constructs a human body structure-based posture-guided virtual human body image generation network (SAGN), and carries out iterative optimization on the constructed convolutional neural network (SAGN) to obtain a pre-trained convolutional neural network (SAGN) to realize the posture-guided virtual human body image generation. The virtual human body image generation network (SAGN) under the posture guidance based on the human body structure can directly generate the appearance of the virtual human body image generation network according to the target posture, so that the consistency of the human body posture and the target posture of the generated image can be ensured to the maximum extent, and meanwhile, the virtual human body image generation network also has vivid human body appearance, so that a vivid human body image with a correct posture can be generated, and the virtual human body image generation under the guidance of the target posture is realized; meanwhile, a new idea is provided for solving the difficult task of generating the human body image in the target posture.
In the system, aiming at the problems that the existing method cannot ensure the effectiveness of deformation, the generated human body image with the specified posture has low quality, the fuzzy human body posture is easy to generate, and even the posture consistency cannot be maintained, a virtual human body image generation network (SAGN) guided by the posture based on the human body structure is introduced, and the SAGN consists of a series of appearance generation modules (SAG-Blk) based on the structure. Each structure-based appearance generation module consists of a structure-aware adaptive normalization (SAN) submodule and a Feature Enhancement (FE) submodule; the structure-aware adaptive normalization (SAN) module generates stylized target attitude features by using a normalization method, and the Feature Enhancement (FE) module provides rich appearance information for the stylized target attitude features, so that the appearance features of the stylized target attitude features are further enhanced. The two sub-modules cooperate together to gradually generate a vivid human body image with a correct posture. And realizing the generation of the virtual human body image under the guidance of the target posture.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art are briefly introduced below; it is obvious that the drawings in the following description are some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
Fig. 1 is a schematic flowchart of a method for generating a virtual human body image under posture guidance based on a human body structure according to an embodiment of the present invention;
FIG. 2 is a schematic view of a joint of a human body according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a virtual human body image generation network (SAGN) under the guidance of human body structure-based posture in an embodiment of the present invention;
FIG. 4 is a schematic diagram of a fabric-aware adaptive (SAN) module in an embodiment of the present invention;
FIG. 5 is a schematic diagram of a partial result on a Market-1501 data set in accordance with an embodiment of the present invention;
fig. 6 is a graphical representation of a portion of the results on the depfashinon dataset in an embodiment of the present invention.
Detailed Description
In order to make the purpose, technical effect and technical solution of the embodiments of the present invention clearer, the following clearly and completely describes the technical solution of the embodiments of the present invention with reference to the drawings in the embodiments of the present invention; it is to be understood that the described embodiments are only some of the embodiments of the present invention. Other embodiments, which can be derived by one of ordinary skill in the art from the disclosed embodiments without inventive faculty, are intended to be within the scope of the invention.
Example 1
In embodiment 1 of the present invention, a method for generating a virtual human body image under posture guidance based on a human body structure is provided, in consideration of generating an appearance directly according to a target posture to maximally ensure consistency between a human body posture and the target posture of a generated image, for solving the problems that the existing method cannot ensure validity of deformation, the generated human body image in a specified posture has low quality, and a blurred human body posture is easily generated, and even the posture consistency cannot be maintained.
SAGN consists of a series of structure-based appearance generating modules (SAG-Blk); wherein each structure-based appearance generation module is comprised of a structure-aware adaptive normalization (SAN) sub-module and a Feature Enhancement (FE) sub-module. The structure-aware adaptive normalization (SAN) module generates stylized target attitude features by using a normalization method, and the Feature Enhancement (FE) module provides rich appearance information for the stylized target attitude features, so that the appearance features of the stylized target attitude features are further enhanced. The two sub-modules cooperate together to gradually generate a vivid human body image with a correct posture. And realizing the generation of the virtual human body image under the guidance of the target posture.
The method for generating the virtual human body image under the posture guidance based on the human body structure in the embodiment 1 of the invention comprises the following steps:
step 1, acquiring an acquired image, acquiring a source human body image and a target human body image, and acquiring posture information of the source human body image and the target human body image according to the source human body image and the target human body image; the source human body image and the target human body image are used for representing different human body postures under the same appearance;
step 2, constructing a virtual human body image generation network (SAGN) guided by the posture based on the human body structure; wherein the specific steps of the virtual human body image generation network (SAGN) construction include:
constructing an Encoder (Encoder); constructing a structure-based appearance generation module (SAG-Blk); constructing a Decoder (Decoder); wherein, the specific steps of the structure-based appearance generation module construction include: constructing a structure-aware adaptive normalization (SAN) module and a Feature Enhancement (FE) module;
step 3, inputting the source human body image, the source human body posture information and the target human body posture information in the step 1 into the network constructed in the step 2 to obtain a virtual target human body image;
step 4, constructing a loss function based on the virtual target human body image obtained in the step 3 and the target human body image acquired and obtained in the step 1, and performing iterative optimization on the network constructed in the step 2; and after the preset iteration times are reached, obtaining an optimized virtual human body image generation network under the posture guidance based on the human body structure, and generating a vivid human body image with a correct posture by realizing the virtual human body image generation under the target posture guidance.
In embodiment 2 of the present invention, in step 1, the specific steps of obtaining the posture information of the source human body image and the target human body image according to the source human body image and the target human body image include:
step 1.1, carrying out posture estimation on the human body image by using a posture estimation method to obtain joint point coordinate sequences of a preset number of source human body images and joint point coordinate sequences of a target human body image;
and step 1.2, representing the human body posture information by heat map based on the human body joint point coordinate sequence obtained in the step 1.1, and obtaining source human body posture information and target human body posture information.
Illustratively, step 1.1 in the embodiment of the present invention specifically includes: carrying out attitude estimation on the human body image by using an openposition attitude estimation method to obtain 18 human body joint point coordinate sequences; wherein the source human body image
Figure BDA0003187127880000081
Is expressed as p (is) ═ p1,…,pKH, 18; target human body image
Figure BDA0003187127880000082
Is expressed as P (I)t)={p1,…,pK},K=18。
In step 1.2, the method specifically comprises the following steps: based on the human body joint point coordinate sequence obtained in the step 1.1, representing human body posture information by K heat maps; wherein the source human body posture information is expressed as
Figure BDA0003187127880000083
The target human posture information is expressed as
Figure BDA0003187127880000084
In embodiment 3 of the present invention, in step 2, the specific step of constructing a virtual human body image generation network (SAGN) based on the posture guidance of the human body structure includes:
step 2.1, constructing an Encoder (Encoder), encoding the input target posture information, the input source human body image and the input posture information respectively, and encoding the target posture information, the input source human body image and the input posture information into target human body characteristics and source human body characteristics to obtain an Encoder;
step 2.2, constructing an appearance generating module (SAG-Blk) based on a structure, updating the target human body characteristics and the source human body characteristics in the step 2.1 (or new target human body characteristics and new source human body characteristics output by the last SAG-Blk), and generating appearance information of the target human body characteristics in the step 2.1 according to the source human body characteristics in the step 2.1 to obtain new target human body characteristics and new source human body characteristics; wherein, the total T is 9 cascaded SAG-Blk to gradually generate appearance information of the target human body characteristics; finally, 9 cascaded appearance generating modules based on the structure are obtained;
and 2.3, constructing a Decoder (Decoder), decoding the new target human body characteristics output by the last SAG-Blk in the step 2.2, and generating a human body image of the target posture to obtain the Decoder.
Illustratively, step 2.1 in the embodiment of the present invention specifically includes: targeting pose information P with 2 downsampled convolutional layerstEncoding as target human features Ct(ii) a Source human image I with 2 downsampled convolution layerssAnd attitude information PsEncoding as source human body characteristic Cs
In embodiment 4 of the present invention, in step 2.2, the specific step of constructing the structure-based appearance generating module (SAG-Blk) includes:
step 2.2.1, a structure-aware adaptive normalization (SAN) module is constructed, the target human body feature in the step 2.1 (or a new target human body feature output by the last SAG-Blk) is updated by using a normalization method, and stylized target posture features are generated; finally obtaining a structure-aware self-adaptive normalization module;
step 2.2.2, constructing a Feature Enhancement (FE) module to provide rich appearance information for the stylized target posture feature obtained in step 2.2.1, namely, the source human body feature in step 2.1 (or a new source human body feature output by the last SAG-Blk) further enhances the appearance feature of the stylized target posture feature; and finally obtaining the characteristic enhancement module.
Illustratively, step 2.2.1 in the embodiment of the present invention specifically includes: based on the human body joint point coordinate sequence obtained in the step 1.1, dividing the human body image into a plurality of human body joint point coordinate sequencesA human body part and 1 background part are dried, and L masks of all parts are obtained; wherein each partial mask of the source human body image is represented as
Figure BDA0003187127880000101
The partial mask of the target human body image is represented as
Figure BDA0003187127880000102
Figure BDA0003187127880000103
Step 2.2.1 specifically comprises: targeting body features C with two convolutional layers+(or the last SAG-Blk output New target body characteristic C't) Carrying out convolution to obtain target human body characteristics
Figure BDA0003187127880000104
Targeting body features C with two convolutional layerss(or the last SAG-Blk output New Source body characteristic C's) Convolving to obtain source human body characteristics
Figure BDA0003187127880000105
According to source human body characteristics FsAnd partial mask M of source human body imagesGenerating a style vector
Figure BDA0003187127880000106
Wherein for VstyFor each of the rows of the plurality of rows,
Figure BDA0003187127880000107
is a C-dimensional vector representing the characteristics of each part of the source human body image; specifically, mean pooling is used herein to obtain the style vector for the 1 st segment
Figure BDA0003187127880000108
Figure BDA0003187127880000109
Where Resize (·) denotes a zoom operation, where M needs to be scaledsScaling to have and FsThe same size, i.e., H 'x W';
Figure BDA00031871278800001010
representing element-by-element multiplication; pool (·) denotes a pooling operation where all elements other than 0 are pooled.
According to the corresponding relation of all parts of the source human body image and the target human body image, the style vector V is converted into a style vector VstyPartial mask M inserted into target human body imagetObtaining style matrix T in corresponding partssty(ii) a Wherein the 1 st style vector is used
Figure BDA00031871278800001011
Inserting the style matrix into the 1 st mask of the target human body in a broadcasting way to generate a 1 st style matrix
Figure BDA00031871278800001012
All L style matrices
Figure BDA00031871278800001013
Figure BDA00031871278800001014
Element by element addition to obtain the final style matrix Tsty
Figure BDA00031871278800001015
The style matrix T obtained herestyThere is a body pose that is consistent with the target body pose, but contains the most critical body appearance information.
Using two convolutional layers to form a style matrix TstyConvolution is carried out to obtain modulation parameters in normalization operation
Figure BDA00031871278800001016
And
Figure BDA00031871278800001017
meanwhile, here, the target attitude feature F is+Is subjected to batch normalization treatment to obtain
Figure BDA00031871278800001018
Finally, F is subtended with gamma and betanormModulating to obtain stylized target attitude characteristic Fsty
Fsty=γFnorm+β;
Notably, the stylized target pose feature F derived herestyThe posture information of the target posture is kept, and the most key human body appearance information is also contained.
In step 2.2.2 of the embodiment of the present invention, the method specifically includes: to target characteristic attitude FstyAnd source human features FsSplicing and fusing, and then enhancing the fused features by using a Squeeze-and-Excitation operation to obtain enhanced features FfuseA residual module is added to further accelerate network training to obtain a final new target human body feature C't
Figure BDA0003187127880000111
Wherein,
Figure BDA0003187127880000112
represents a feature fusion operation, pair Ct,Ft,FsThe fusion is carried out by splicing and adding. Finally, new source human body characteristic C'sIs a source human body characteristic FsAnd New target human body characteristic C'tAnd (4) splicing.
In step 2.3 of the embodiment of the present invention, the method specifically includes: new target body feature C 'output to last SAG-Blk with 2 upsampled convolutional layers'tDecoding to generate human body image of target posture
Figure BDA0003187127880000113
In step 4 of the embodiment of the present invention, constructing the loss function based on the virtual target human body image obtained in step 3 and the target human body image acquired and obtained in step 1 specifically includes: a confrontational loss function, a perceptual loss function, and a loss function based on similarity of human structures.
The system for generating a virtual human body image based on posture guidance of a human body structure in embodiment 4 of the present invention includes:
the sample acquisition module is used for acquiring and acquiring a source human body image and a target human body image; obtaining target posture information according to the target human body image and the target human body image;
a network model construction module for constructing a virtual human body image generation network (SAGN) under the guidance of the posture based on the human body structure; wherein the human body image generation network (SAGN) comprises three parts: an Encoder (Encoder), a structure-based appearance generation module (SAG-Blk), a Decoder (Decoder); wherein the structure-based appearance generation module comprises two parts: a structure-aware adaptive normalization (SAN) module and a Feature Enhancement (FE) module;
the training module is used for inputting the source human body image, the source human body posture information and the target human body posture information into a constructed network (SAGN) to obtain a virtual target human body image;
an optimization module for constructing a loss function based on the virtual target human body image and the real target human body image, and performing iterative optimization on the network (SAGN); and after the preset iteration times are reached, obtaining an optimized virtual human body image generation network under the posture guidance based on the human body structure, and generating a vivid human body image with a correct posture by realizing the virtual human body image generation under the target posture guidance.
Aiming at the problems that the existing method cannot ensure the effectiveness of deformation, the generated human body image with the specified posture has low quality, the fuzzy human body posture is easy to generate, and even the posture consistency cannot be maintained, the system introduces a virtual human body image generation network (SAGN) guided by the posture based on the human body structure, and the SAGN consists of a series of appearance generation modules (SAG-Blk) based on the structure. Each structure-based appearance generation module consists of a structure-aware adaptive normalization (SAN) sub-module and a Feature Enhancement (FE) sub-module. The structure-aware adaptive normalization (SAN) module generates stylized target attitude features by using a normalization method, and the Feature Enhancement (FE) module provides rich appearance information for the stylized target attitude features, so that the appearance features of the stylized target attitude features are further enhanced. The two sub-modules cooperate together to gradually generate a vivid human body image with a correct posture. And realizing the generation of the virtual human body image under the guidance of the target posture.
The method for generating the virtual human body image under the posture guidance based on the human body structure, disclosed by the embodiment 5 of the invention, comprises the following steps of:
step 1, obtaining posture information of a source human body image and a target human body image according to the source human body image and the target human body image:
1.1) carrying out posture estimation on the human body image by using a posture estimation method to obtain joint point coordinate sequences of a preset number of source human body images and joint point coordinate sequences of a target human body image;
1.2) based on the human body joint point coordinate sequence obtained in the step 1.1), representing human body posture information by heat map, and obtaining source human body posture information and target human body posture information.
Step 2, constructing a virtual human body image generation network (SAGN) under the guidance of the posture of the human body structure:
2.1) constructing an Encoder (Encoder), respectively encoding the input target posture information, the input source human body image and the input posture information into target human body characteristics and source human body characteristics to obtain an Encoder;
2.2) constructing an appearance generating module (SAG-Blk) based on a structure, updating the target human body characteristics and the source human body characteristics in the step 2.1) (or new target human body characteristics and new source human body characteristics output by the last SAG-Blk), and generating appearance information of the target human body characteristics in the step 2.1) according to the source human body characteristics in the step 2.1) to obtain new target human body characteristics and new source human body characteristics; wherein, the total T is 9 cascaded SAG-Blk to gradually generate appearance information of the target human body characteristics; finally, 9 cascaded appearance generating modules based on the structure are obtained;
2.3) constructing a Decoder (Decoder), decoding the new target human body characteristics output by the last SAG-Blk in the step 2.2), and generating a human body image of the target posture to obtain the Decoder.
Step 3, generating a target posture human body image:
1) organizing data input into a convolutional neural network;
2) and generating a target posture human body image by using the convolutional neural network (SAGN) constructed in the step 2.
Step 4, constructing a loss function of a convolutional neural network (SAGN):
4.1) constructing a resistance loss function;
4.2) constructing a perception loss function;
4.3) constructing a loss function based on the similarity of human body structures;
step 5, optimizing network parameters, and realizing the generation of the virtual human body image under the guidance of the target posture:
5.1) carrying out iterative optimization on the network parameters constructed in the step 2 according to the loss function obtained in the step 4;
5.2) when the preset iteration number is reached, generating the virtual human body image under the guidance of the target posture by using the convolutional neural network (SAGN) constructed in the step 2.
Aiming at the problems that the existing method cannot ensure the effectiveness of deformation, the generated human body image with the specified posture is low in quality, fuzzy human body postures are easy to generate, and even the posture consistency cannot be maintained, the virtual human body image generation method under the posture guidance based on the human body structure introduces a virtual human body image generation network (SAGN) under the posture guidance based on the human body structure, and the SAGN is composed of a series of appearance generation modules (SAG-Blk) based on the structure. Each structure-based appearance generation module consists of a structure-aware adaptive normalization (SAN) sub-module and a Feature Enhancement (FE) sub-module. The structure-aware adaptive normalization (SAN) module generates stylized target attitude features by using a normalization method, and the Feature Enhancement (FE) module provides rich appearance information for the stylized target attitude features, so that the appearance features of the stylized target attitude features are further enhanced. The two sub-modules cooperate together to gradually generate a vivid human body image with a correct posture. And realizing the generation of the virtual human body image under the guidance of the target posture.
Referring to fig. 1, a method for generating a virtual human body image under posture guidance based on a human body structure according to an embodiment of the present invention includes the following steps:
step 1, obtaining posture information of a source human body image and a target human body image according to the source human body image and the target human body image:
1.1) carrying out posture estimation on the human body image by using a posture estimation method to obtain the K-18 joint point coordinates of the source human body image and the K-18 joint point coordinates of the target human body image.
In the embodiment of the invention, an openposition attitude estimation method is used for carrying out attitude estimation on a human body image to obtain 18 human body joint point coordinate sequences; wherein the source human body image
Figure BDA0003187127880000141
Is expressed as P (I)s)={p1,…,pKH, 18; target human body image
Figure BDA0003187127880000142
Is expressed as P (I)t)={p1,…,pKH, 18; fig. 2 is a schematic diagram of 18 joint points.
1.2) based on the human body joint point coordinate sequence obtained in the step 1.1), representing human body posture information by heat map, and obtaining source human body posture information and target human body posture information.
In the embodiment of the invention, in order to utilize the spatial characteristics of the coordinates of the human body joint points, K is 18 heat maps to represent the human body posture information; wherein the source human body posture information is expressed as
Figure BDA0003187127880000143
TargetThe human posture information is expressed as
Figure BDA0003187127880000144
Step 2, constructing a virtual human body image generation network (SAGN) under the guidance of the posture of the human body structure:
the virtual human body image generation network (SAGN) guided by the posture based on the human body structure is composed of an encoder, a structure-based appearance generation module and a decoder; fig. 3 is a schematic diagram of a virtual human body image generation network (SAGN) structure under the guidance of a human body structure-based posture.
2.1) constructing an Encoder (Encoder), and respectively encoding the input target attitude information and the input source human body image and attitude information.
In an embodiment of the present invention, target pose information P is mapped to 2 downsampled convolutional layerstEncoding as target human features Ct(ii) a Source human image I with 2 downsampled convolution layerssAnd attitude information PsEncoding as source human body characteristic Cs
2.2) building a structure-based appearance generation module (SAG-Blk).
In the embodiment of the invention, T is 9 structure-based appearance generation modules (SAG-Blk) are shared by a virtual human body image generation network (SAGN) under the posture guidance of a human body structure, and each SAG-Blk is composed of a structure-aware adaptive (SAN) module and a Feature Enhancement (FE) module. The structure-aware adaptive normalization (SAN) module generates stylized target attitude features by using a normalization method, and the Feature Enhancement (FE) module provides rich appearance information for the stylized target attitude features, so that the appearance features of the stylized target attitude features are further enhanced. The two sub-modules cooperate together to gradually generate a vivid human body image with a correct posture. And realizing the generation of the virtual human body image under the guidance of the target posture.
2.2.1) construct a fabric-aware adaptive (SAN) module.
In the embodiment of the invention, the human body image is divided into 10 human body parts and 1 back based on the human body joint point coordinate sequence obtained in the step 1.1A landscape part including a head, a left (right) upper arm, a left (right) lower arm, a left (right) thigh, a left (right) shank, a trunk and a background; obtaining masks of L parts; wherein each partial mask of the source human body image is represented as
Figure BDA0003187127880000151
The partial mask of the target human body image is represented as
Figure BDA0003187127880000152
Fig. 4 is a schematic diagram of a structure-aware adaptive (SAN) module. In an embodiment of the invention, two convolutional layers are used to pair the target human body feature Ct(or the last SAG-Blk output New target body characteristic C't) Carrying out convolution to obtain target human body characteristics
Figure BDA0003187127880000153
Targeting body features C with two convolutional layerss(or the last SAG-Blk output New Source body characteristic C's) Convolving to obtain source human body characteristics
Figure BDA0003187127880000154
According to source human body characteristics FsAnd partial mask M of source human body imagesGenerating a style vector
Figure BDA0003187127880000155
Wherein for VstyFor each of the rows of the plurality of rows,
Figure BDA0003187127880000161
is a C-dimensional vector representing the characteristics of each part of the source human body image; specifically, mean pooling is used herein to obtain the style vector for the 1 st segment
Figure BDA0003187127880000162
Figure BDA0003187127880000163
Where Resize (·) denotes a zoom operation, where M needs to be scaledsScaling to have and FsThe same size, i.e., H 'x W';
Figure BDA0003187127880000164
representing element-by-element multiplication; pool (·) denotes a pooling operation where all elements other than 0 are pooled.
According to the corresponding relation of all parts of the source human body image and the target human body image, the style vector V is converted into a style vector VstyPartial mask M inserted into target human body imagetObtaining style matrix T in corresponding partssty(ii) a Wherein the 1 st style vector is used
Figure BDA0003187127880000165
Inserting the style matrix into the 1 st mask of the target human body in a broadcasting way to generate a 1 st style matrix
Figure BDA0003187127880000166
All L style matrices
Figure BDA0003187127880000167
Figure BDA0003187127880000168
Element by element addition to obtain the final style matrix Tsty
Figure BDA0003187127880000169
The style matrix T obtained herestyThere is a body pose that is consistent with the target body pose, but contains the most critical body appearance information.
Using two convolutional layers to form a style matrix TstyConvolution is carried out to obtain modulation parameters in normalization operation
Figure BDA00031871278800001610
And
Figure BDA00031871278800001611
meanwhile, here, the target attitude feature F istIs subjected to batch normalization treatment to obtain
Figure BDA00031871278800001612
Finally, F is subtended with gamma and betanormModulating to obtain stylized target attitude characteristic Fsty
Fsty=γFnorm+β;
Notably, the stylized target pose feature F derived herestyThe posture information of the target posture is kept, and the most key human body appearance information is also contained.
2.2.1) building a Feature Enhancement (FE) module.
To target characteristic attitude FstyAnd source human features FsSplicing and fusing, and then enhancing the fused features by using a Squeeze-and-Excitation operation to obtain enhanced features FfuseA residual module is added to further accelerate network training to obtain a final new target human body feature C't
Figure BDA0003187127880000171
Wherein,
Figure BDA0003187127880000172
represents a feature fusion operation, pair Ct,Ft,FsThe fusion is carried out by splicing and adding. Finally, new source human body characteristic C'sIs a source human body characteristic FsAnd New target human body characteristic C'tAnd (4) splicing.
And 2.3) constructing a Decoder (Decoder), decoding the new target human body characteristics output by the last SAG-Blk in the step 2.2), and generating a human body image of the target posture.
In an embodiment of the invention, the new target human body feature C 'output by the last SAG-Blk in step 2.2 is subjected to convolution layer sampling 2'tDecoding to generate human body image of target posture
Figure BDA0003187127880000173
Step 3, generating a target posture human body image:
1) data input to the convolutional neural network is organized.
The data input into the network is divided into two parts, one part is the target human body posture information expressed by heat map obtained in step 1, and the other part is the source human body image and the source human body posture information expressed by heat map.
2) And generating a target posture human body image by using the convolutional neural network (SAGN) constructed in the step 2.
And inputting the organized data into a network to generate a human body image of the target posture.
Step 4, constructing a loss function of a convolutional neural network (SAGN):
in the embodiment of the invention, a combination of an antagonistic loss function, a perceptual loss function and a loss function based on human body structure similarity is used as the loss function of the convolutional neural network (SAGN) provided by the invention.
Wherein the fight loss function uses a discriminator for measuring the distance between the true image distribution and the generated image distribution and continuously reducing the distance between the two distributions. The invention constructs two discriminators, namely an appearance discriminator and a posture discriminator respectively, which are used for ensuring a real image ItAnd generating an image
Figure BDA0003187127880000174
Appearance consistency and posture consistency, and the formula is defined as:
Figure BDA0003187127880000175
wherein,
Figure BDA0003187127880000176
and
Figure BDA0003187127880000177
representing the human body posture image and the distribution of the real human body image,
Figure BDA0003187127880000178
which represents the human image generation network proposed by the present invention. More details can be found in the Progressive position authentication transfer for person image generation.
The perceptual loss function is used to measure the similarity between the feature maps of the real image and the generated image, and usually the L1 distance between two feature maps is calculated as the perceptual loss, namely:
Figure BDA0003187127880000181
wherein phi isiIs the output of the i-th layer of a pre-trained network, and usually adopts the characteristic diagram of the conv1_2 layer output of the VGG-19 network pre-trained in ImageNet. See the Perceptial losses for real-time style transfer and super-resolution for more details.
The loss function based on human body structure similarity is used for measuring the structure similarity of each human body part of the real image and the generated image. The accurate measurement of the similarity of each human body part can bring clear human body boundary and detailed texture characteristics to the virtual human body image. It is defined as:
Figure BDA0003187127880000182
wherein MSSIM (·,. cndot.) is a structural similarity, i.e., ItAnd
Figure BDA0003187127880000183
structural similarity of (1) background (section 0). SSIMl(-) is the structural similarity of the 1 st part of the human image. See Loss Functions for Person Image Generation for more details.
4.1) constructing a resistance loss function.
"general adaptive networks" achieves better effects in image generation using the penalty function, and the penalty function in this paper is used as one of the penalty functions of the convolutional neural network (SAGN) proposed by the present invention.
4.2) constructing a perception loss function.
"Perceptual losses for real-time style transfer and super-resolution" achieves better effect in style migration by using the Perceptual loss function, and the Perceptual loss function in the paper is taken as one of the loss functions of the convolutional neural network (SAGN) proposed by the invention for reference.
4.3) constructing a loss function based on the similarity of human body structures.
The Loss functions for person image generation use Loss functions based on human body structure similarity to effectively calculate the structure similarity of each part of a human body, and obtain a better effect in the aspect of human body image generation.
Step 5, optimizing network parameters, and realizing the generation of the virtual human body image under the guidance of the target posture:
5.1) carrying out iterative optimization on the network parameters constructed in the step 2 according to the loss function obtained in the step 4;
iterate 90k times using Adam optimizer, where β1=0.5,β2=0.999。
5.2) when the preset iteration number is reached, generating the virtual human body image under the guidance of the target posture by using the convolutional neural network (SAGN) constructed in the step 2.
In summary, the method of the invention provides a virtual human body image generation network under posture guidance based on a human body structure aiming at a source human body image and any one target human body posture image; firstly, carrying out posture estimation on an input human body image to obtain a joint point coordinate sequence of the human body image; then constructing a virtual human body image generation network (SAGN) under the guidance of the posture based on the human body structure, wherein the virtual human body image generation network (SAGN) comprises an encoder, a structure-based appearance generation module (SAG-Blk) and a decoder, and the structure-based appearance generation module (SAG-Blk) is composed of a structure-aware adaptive normalization (SAN) sub-module and a Feature Enhancement (FE) sub-module; then constructing a loss function of the convolutional neural network, wherein the loss function comprises an antagonistic loss function, a perceptual loss function and a loss function based on human body structure similarity; and finally, generating a virtual human body image under the guidance of the target posture by using the loss function combined optimization proposed convolutional neural network (SAGN). Compared with the existing method, the method carries out qualitative and quantitative comparative experimental analysis, and the effectiveness of the method is verified on two public data sets, namely Market-1501 and DeepFashinon.
Tables 1a and 1b are the results of the quantitative experiments of the present invention, respectively, with Table 1a being the results of the method under the Market-1501 data set and Table 1b being the results of the method under the DeepFashion data set.
TABLE 1a Experimental results of this method under Market-1501 data set
Figure BDA0003187127880000191
Figure BDA0003187127880000201
TABLE 1b Experimental results of this method under the DeepFashion data set
Figure BDA0003187127880000202
SSIM, IS and DS are common indexes for measuring the quality of image generation, the larger the numerical value IS, the more vivid and the higher the quality of the generated image IS, FID and LPIPS are also common indexes for measuring the quality of image generation, and the smaller the numerical value IS, the more vivid and the higher the quality of the generated image IS. As can be seen from Table 1a, on the Market-1501 data set, the image generated by the method reaches the highest value on all indexes, particularly SSIM (structural similarity), and reaches the highest value of 0.321. As can be seen from Table 1b, on the DeepFashinon data set, the images generated by the method all reach the second level on SSIM, IS, DS and LPIPS, and a more reliable image generation effect IS obtained. Therefore, from the quantitative result, the virtual human body image generation method based on the structural similarity can generate a more real virtual human body image.
Fig. 5 and fig. 6 are qualitative experimental results of the present invention, respectively, and fig. 5 is an image generated by the present invention under the Market-1501 data set, and it can be seen that the virtual human body image with clear human body posture and real appearance is generated by our method. Especially in the case of large pose transitions, the virtual body image generated by our method still maintains the correct body pose (e.g., lines 2,4, 5); fig. 6 is a generated image of the present invention under the deep fast image data set, and it can be seen that human body images generated by other methods are easy to have some artificial traces, while our method still maintains the correct human body posture and real appearance, and it is noted that our method can maintain the correct and integrity of the posture of the generated virtual human body even if the target posture has a very complex human body posture, which makes the generated image look very real. Therefore, from the qualitative result, the virtual human body image generation method under the posture guidance based on the human body structure can generate a vivid human body image with a correct posture.
In summary, the invention discloses a new method, a system and electronic equipment for generating a virtual human body image under posture guidance based on a human body structure, belonging to the technical field of computer vision and computer graphics intersection. The invention constructs a virtual human body image generation network (SAGN) under the guidance of human body structure posture, which comprises an encoder, a structure-based appearance generation module (SAG-Blk) and a decoder, wherein the structure-based appearance generation module (SAG-Blk) consists of a structure-aware adaptive normalization (SAN) sub-module and a Feature Enhancement (FE) sub-module; then constructing a loss function of the convolutional neural network, wherein the loss function comprises an antagonistic loss function, a perceptual loss function and a loss function based on human body structure similarity; and finally, generating a virtual human body image under the guidance of the target posture by using the loss function combined optimization proposed convolutional neural network (SAGN). The invention can generate a vivid virtual human body image with correct posture.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art can make modifications and equivalents to the embodiments of the present invention without departing from the spirit and scope of the present invention, which is set forth in the claims of the present application.

Claims (10)

1. A virtual human body image generation method is characterized by comprising the following steps:
inputting a source human body image and a target posture image into a pre-trained virtual human body image generation network, and outputting the virtual human body image generation network to obtain a target posture human body image;
wherein, the virtual human body image generation network is a convolution neural network, and comprises:
the encoder is used for inputting the source human body image and the target posture image, and encoding to obtain and output source human body characteristics and target human body characteristics;
the structure-based appearance generation module is used for inputting and updating the source human body characteristics and the target human body characteristics, acquiring and outputting the updated source human body characteristics and the updated target human body characteristics;
and the decoder is used for inputting the target human body characteristics output by the structure-based appearance generation module and decoding to obtain a target posture human body image.
2. The virtual human body image generation method according to claim 1, wherein the step of acquiring the trained virtual human body image generation network specifically includes:
acquiring a sample data set; each sample data in the sample data set comprises source human body image sample data, target human body image sample data, source human body posture sample data and target human body posture sample data;
inputting source human body image sample data, source human body posture sample data and target human body posture sample data in selected sample data of the sample data set into the virtual human body image generation network to obtain virtual target human body image data; constructing a loss function based on the virtual target human body image data and target human body image sample data in the selected sample data, and performing iterative optimization on the virtual human body image generation network;
and obtaining the trained virtual human body image generation network after reaching the preset iteration times or convergence conditions.
3. The virtual human body image generation method according to claim 1, wherein the structure-based appearance generation module comprises:
the structure perception self-adaptive normalization module is used for inputting the source human body characteristics and the target human body characteristics, generating stylized target posture characteristics and outputting the stylized target posture characteristics;
and the characteristic enhancement module is used for inputting the generated stylized target posture characteristic and the source human body characteristic and outputting the updated source human body characteristic and the updated target human body characteristic.
4. The virtual human body image generation method according to claim 2,
in the sample data set, the step of acquiring source human body posture sample data and target human body posture sample data in each sample data comprises: carrying out attitude estimation on the human body image by adopting an openposition attitude estimation method to obtain18 human body joint point coordinate sequences; wherein the source human body image
Figure FDA0003187127870000021
Is expressed as P (I)s)={p1,…,pKH, 18; target human body image
Figure FDA0003187127870000022
Is expressed as P (I)t)={p1,…,pKH, 18; based on the obtained coordinate sequence of the human body joint points, K heat maps are used for representing human body posture information; wherein the source human body posture information is expressed as
Figure FDA0003187127870000023
The target human posture information is expressed as
Figure FDA0003187127870000024
In the encoder, the step of encoding to obtain the source human body characteristic and the target human body characteristic specifically includes: targeting pose information P with 2 downsampled convolutional layerstEncoding as target human features Ct(ii) a Source human image I with 2 downsampled convolution layerssAnd attitude information PsEncoding as source human body characteristic Cs
5. The method for generating a virtual human body image according to claim 4, wherein the step of inputting and updating the source human body features and the target human body features in the structure-based appearance generation module to obtain the updated source human body features and the updated target human body features specifically comprises:
dividing the human body image into a plurality of human body parts and 1 background part based on the obtained human body joint point coordinate sequence to obtain L masks of each part; wherein each partial mask of the source human body image is represented as
Figure FDA0003187127870000025
The partial mask of the target human body image is represented as
Figure FDA0003187127870000026
Targeting body features C with two convolutional layerstCarrying out convolution to obtain target human body characteristics
Figure FDA0003187127870000027
Targeting body features C with two convolutional layerssConvolving to obtain source human body characteristics
Figure FDA0003187127870000028
According to source human body characteristics FsAnd partial mask M of source human body imagesGenerating a style vector
Figure FDA0003187127870000029
Wherein for VstyFor each of the rows of the plurality of rows,
Figure FDA00031871278700000210
l is a C-dimensional vector and represents the characteristics of each part of the source human body image; obtaining a style vector for the ith part using mean pooling
Figure FDA0003187127870000031
Figure FDA0003187127870000032
Wherein Resize (·) represents a zoom operation;
Figure FDA0003187127870000033
representing element-by-element multiplication; pool (·) denotes pooling;
according to the corresponding relation of all parts of the source human body image and the target human body image, the style vector V is converted into a style vector VstyPartial mask M inserted into target human body imagetObtaining style matrix T in corresponding partssty(ii) a Wherein, the first style vector is used
Figure FDA0003187127870000034
Inserting the first style matrix into the first mask of the target human body image in a broadcasting way to generate a first style matrix
Figure FDA0003187127870000035
All L style matrices
Figure FDA0003187127870000036
Figure FDA0003187127870000037
Adding L element by element to obtain final style matrix Tsty
Figure FDA0003187127870000038
Using two convolutional layers to form a style matrix TstyConvolution is carried out to obtain modulation parameters in normalization operation
Figure FDA0003187127870000039
And
Figure FDA00031871278700000310
to the target attitude feature FtIs subjected to batch normalization treatment to obtain
Figure FDA00031871278700000311
Using gamma and beta to FnormModulating to obtain stylized target attitude characteristic Fsty
Fsty=γFnorm+β;
To target characteristic attitude FstyHuman body of HeyuanCharacteristic FsSplicing and fusing, and then enhancing the fused features by using a Squeeze-and-Excitation operation to obtain enhanced features Ffuse
Obtaining updated target human body characteristic C't
Figure FDA00031871278700000312
Wherein,
Figure FDA00031871278700000313
represents a feature fusion operation, pair Ct,Ft,FsFusing in a splicing and adding mode;
updated source human body characteristic C'sIs a source human body characteristic FsAnd updated target body characteristic C'tAnd (4) splicing.
6. The virtual human image generation method of claim 5, wherein the loss function comprises: an antagonistic loss function, a perceptual loss function, and a loss function based on similarity of human structures.
7. The virtual human image generation method of claim i, wherein the structure-based appearance generation module is replaced with an integrated appearance generation module;
the integrated appearance generating module is composed of a plurality of the structure-based appearance generating modules in a cascade.
8. A virtual human body image generation system, comprising:
the image generation module is used for inputting the source human body image and the target posture image into a pre-trained virtual human body image generation network and outputting the virtual human body image generation network to obtain a target posture human body image;
wherein, the virtual human body image generation network is a convolution neural network, and comprises:
the encoder is used for inputting a source human body image and a target posture image, and encoding to obtain a source human body characteristic and a target human body characteristic;
the structure-based appearance generation module is used for inputting and updating the source human body characteristics and the target human body characteristics to obtain updated source human body characteristics and target human body characteristics;
and the decoder is used for inputting the target human body characteristics output by the structure-based appearance generation module and decoding to obtain a target posture human body image.
9. An electronic device, characterized in that the electronic device comprises a processor and a memory, the processor being configured to execute a computer program stored in the memory to implement the virtual human body image generation method according to any one of claims 1 to 7.
10. A computer-readable storage medium storing at least one instruction which, when executed by a processor, implements the virtual human body image generation method according to any one of claims 1 to 7.
CN202110865481.2A 2021-07-29 2021-07-29 Virtual human body image generation method, system, equipment and medium Active CN113592971B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110865481.2A CN113592971B (en) 2021-07-29 2021-07-29 Virtual human body image generation method, system, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110865481.2A CN113592971B (en) 2021-07-29 2021-07-29 Virtual human body image generation method, system, equipment and medium

Publications (2)

Publication Number Publication Date
CN113592971A true CN113592971A (en) 2021-11-02
CN113592971B CN113592971B (en) 2024-04-16

Family

ID=78252264

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110865481.2A Active CN113592971B (en) 2021-07-29 2021-07-29 Virtual human body image generation method, system, equipment and medium

Country Status (1)

Country Link
CN (1) CN113592971B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114821811A (en) * 2022-06-21 2022-07-29 平安科技(深圳)有限公司 Method and device for generating person composite image, computer device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110852941A (en) * 2019-11-05 2020-02-28 中山大学 Two-dimensional virtual fitting method based on neural network
US10679046B1 (en) * 2016-11-29 2020-06-09 MAX-PLANCK-Gesellschaft zur Förderung der Wissenschaften e.V. Machine learning systems and methods of estimating body shape from images
WO2020168844A1 (en) * 2019-02-19 2020-08-27 Boe Technology Group Co., Ltd. Image processing method, apparatus, equipment, and storage medium
CN112116673A (en) * 2020-07-29 2020-12-22 西安交通大学 Virtual human body image generation method and system based on structural similarity under posture guidance and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10679046B1 (en) * 2016-11-29 2020-06-09 MAX-PLANCK-Gesellschaft zur Förderung der Wissenschaften e.V. Machine learning systems and methods of estimating body shape from images
WO2020168844A1 (en) * 2019-02-19 2020-08-27 Boe Technology Group Co., Ltd. Image processing method, apparatus, equipment, and storage medium
CN110852941A (en) * 2019-11-05 2020-02-28 中山大学 Two-dimensional virtual fitting method based on neural network
CN112116673A (en) * 2020-07-29 2020-12-22 西安交通大学 Virtual human body image generation method and system based on structural similarity under posture guidance and electronic equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张婧;孙金根;陈亮;刘韵婷;: "基于无监督学习的单人多姿态图像生成方法", 光电技术应用, no. 02 *
陈佳宇;钟跃崎;余志才;: "基于二值图像的三维人体模型重建", 毛纺科技, no. 09 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114821811A (en) * 2022-06-21 2022-07-29 平安科技(深圳)有限公司 Method and device for generating person composite image, computer device and storage medium
CN114821811B (en) * 2022-06-21 2022-09-30 平安科技(深圳)有限公司 Method and device for generating person composite image, computer device and storage medium

Also Published As

Publication number Publication date
CN113592971B (en) 2024-04-16

Similar Documents

Publication Publication Date Title
CN109255831B (en) Single-view face three-dimensional reconstruction and texture generation method based on multi-task learning
CN116977522A (en) Rendering method and device of three-dimensional model, computer equipment and storage medium
CN115914505B (en) Video generation method and system based on voice-driven digital human model
CN113160035A (en) Human body image generation method based on posture guidance, style and shape feature constraints
CN113570685A (en) Image processing method and device, electronic device and storage medium
CN116385667B (en) Reconstruction method of three-dimensional model, training method and device of texture reconstruction model
CN110751733A (en) Method and apparatus for converting 3D scanned object into avatar
CN113362422A (en) Shadow robust makeup transfer system and method based on decoupling representation
CN111462274A (en) Human body image synthesis method and system based on SMP L model
CN115049556A (en) StyleGAN-based face image restoration method
CN115018989B (en) Three-dimensional dynamic reconstruction method based on RGB-D sequence, training device and electronic equipment
CN112819951A (en) Three-dimensional human body reconstruction method with shielding function based on depth map restoration
CN117635771A (en) Scene text editing method and device based on semi-supervised contrast learning
CN117237542B (en) Three-dimensional human body model generation method and device based on text
CN113592971B (en) Virtual human body image generation method, system, equipment and medium
CN117593178A (en) Virtual fitting method based on feature guidance
CN112116673B (en) Virtual human body image generation method and system based on structural similarity under posture guidance and electronic equipment
CN116934972B (en) Three-dimensional human body reconstruction method based on double-flow network
CN111311732A (en) 3D human body grid obtaining method and device
CN116863044A (en) Face model generation method and device, electronic equipment and readable storage medium
CN114092610B (en) Character video generation method based on generation of confrontation network
CN116978057A (en) Human body posture migration method and device in image, computer equipment and storage medium
Motegi et al. Human motion generative model using variational autoencoder
CN114331894A (en) Face image restoration method based on potential feature reconstruction and mask perception
CN117893642B (en) Face shape remodelling and facial feature exchanging face changing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant