CN114049652A

CN114049652A - Human body posture migration method and system based on action driving

Info

Publication number: CN114049652A
Application number: CN202111304351.8A
Authority: CN
Inventors: 许轶博; 潘泽文; 范宏伟; 李佳斌
Original assignee: Chengdu Aitneng Electric Technology Co ltd
Current assignee: Chengdu Aitneng Electric Technology Co ltd
Priority date: 2021-11-05
Filing date: 2021-11-05
Publication date: 2022-02-15

Abstract

The invention discloses a human body posture migration method and system based on action driving. The method explicitly and independently detects the human body key points in the model training and deducing processes, increases the stability of human body posture migration, and simultaneously increases the perception loss function based on the target human body in the training process, thereby improving the definition of the output image of the model.

Description

Human body posture migration method and system based on action driving

Technical Field

The invention relates to the technical field of image synthesis processing, in particular to a human body posture migration method and system based on motion driving

Background

The human body posture migration is to give a target character image and a character motion video, generate a new character motion video, make the target character in the generated video the character in the given image, make the character motion in the generated video consistent with the character motion in the given video, and this process is called posture migration. With the development of self-media, gesture migration is more widely applied, and in the existing gesture migration, the key point information of an image is generally obtained first, the mapping relationship between a migrated object and a source object is obtained according to the key point information, and finally a migrated result is output through a model, for example, publication number CN111598977A, which is named as a patent application of a method and a system for transferring and animating expressions.

The following disadvantages still exist in the prior art:

1. the key points of the human body are obtained through training, a large amount of data is needed for training to obtain a stable result, and meanwhile when the human body part of the video is shielded, the detected key positions are unstable, so that the relative mapping relation between the key points can be influenced, the generated character actions are inconsistent, and especially when the size of the generated image is large, the character actions are more obvious.

2. In the prior art, reconstruction loss is generally adopted as a loss function, namely, the integral contrast of an image is adopted as the loss function; when the character moves, the integral definition of the character cannot be additionally enhanced in the training process, so that when the resolution of the output video is high, the character is fuzzy integrally, and the fuzzy of the four limbs and the limb edges of the character is obvious.

Disclosure of Invention

In order to solve the problems, the invention provides a human body posture migration method and a human body posture migration system based on action driving, provides a high-precision human body key point detection model, and explicitly separates a human body key point detection module in the model training and deduction processes to increase the stability of human body posture migration; meanwhile, a perception loss function based on a target human body is added in the training process, so that the model focuses on the definition of the human body after migration and reconstruction, the definition of characters in the image under high resolution is increased, and the migration effect is improved.

The invention provides a human body posture migration method based on action driving, which has the following specific technical scheme:

s1: acquiring human body action video data, extracting image frames of the video data to obtain a plurality of continuous pictures, and screening the extracted pictures to obtain a target picture;

s2: detecting key points of the human body on the target picture to obtain key point coordinates;

s3: randomly extracting two images from the target picture as a source image and a driving image respectively, and calculating a transformation relation between the driving image and the source image according to the obtained key point coordinates;

s4: inputting the transformation relation into an action estimation model, and outputting a corresponding light flow graph and a redrawing;

s5: inputting the light flow graph and the redrawing into an action generation model to obtain a posture migration generation image;

s6: and generating an image based on the driving image and the attitude migration, and calculating a loss function, wherein the specific process is as follows:

s601: calculating a discriminator loss function L by a discriminator network model D_D；

S602: calculating a perceptron loss function L through a human recognition model_I；

S603: combining the discriminator loss function with the perceptron loss function to output a final overall loss function L.

Further, the human body motion video data are single-person motion video data, and the screening is to delete the incomplete video data of the human body.

Further, in step S2, the picture may be rescreened by detecting key points of the human body, and data that the key points of the human body cannot be detected or key points of multiple persons are detected may be deleted.

Further, in step S5, the motion generation model adopts a confrontation generation network, and the specific process of obtaining the pose migration generated image is as follows:

inputting a source image into the action generation model, obtaining a hidden layer feature map of the source image, and splicing the hidden layer feature map with an optical flow map;

and multiplying the obtained splicing result by the redrawing, inputting the multiplied output result into a decoder of the model, and outputting the attitude migration generated image.

Further, in step S601, the discriminator network model adopts a VGG16 model, and the loss function adopts a cross-entropy loss function, which is expressed as follows:

L_D＝-ylog(D(x))-(1-y)log(1-D(x))

wherein x is the input image and y is the image label.

Further, in step S602, the perceptron loss function L_IThe calculation process is as follows:

extracting model hidden layer characteristics of a human body in the posture migration generation image and the driving image through a human body recognition model;

calculating the characteristic difference obtained by correspondingly extracting the attitude migration generated image and the driving image as a perceptron loss function, wherein the formula is as follows:

L_J＝||J(D_g)-J(Q)||

wherein D_gAn image is generated for pose migration and Q is the drive image.

Further, the feature difference is a distance between a pose migration generated image and a hidden feature vector of a last layer of the model, wherein the driving image is input into the model.

Further, in step S603, the combination of the sensor loss function and the discriminator loss function is as follows:

L＝w₁L_J+w₂L_D

wherein w₁，w₂A weight coefficient for each loss function.

The invention also provides a human body posture migration system, which comprises a data module, an action estimation module, an action generation module and a loss function module;

the data module is used for collecting human body action videos and randomly extracting image frames to obtain source images, driving image data and corresponding human body key point coordinates;

the motion estimation module is connected with the data module and used for receiving a source image, a driving image and human body key point coordinate data, and outputting an optical flow field and a redrawing;

the action generation module is connected with the action estimation module and the data module and is used for receiving the photo-flow graph, the redrawing graph and the source image, splicing the hidden layer feature graph of the source image with the photo-flow graph, multiplying the splicing result with the redrawing graph and finally outputting the attitude migration generation image;

the loss function module is connected with the action generation module and the data module and is used for receiving the driving image and the generated image, calculating the perceptron loss function and the discriminator loss function and combining the perceptron loss function and the discriminator loss function to output a total loss function.

The invention has the following beneficial effects:

1. the character posture migration video is obtained through the character action video driving source character images, the human key point coordinates are obtained by utilizing the high-precision key point human key point detection model based on the source images and the driving images, the stability of human posture migration is improved, and the data volume of the model learning key point information is reduced.

2. In the process of model training, a final loss function is formed by combining the perception loss function of a target human body and the loss function of the discriminator, so that the model focuses on human body information, and the definition of the human body after migration reconstruction is improved.

Drawings

FIG. 1 is a schematic view of the model structure of the present invention;

FIG. 2 is a schematic flow chart of the method of the present invention.

Detailed Description

In the following description, technical solutions in the embodiments of the present invention are clearly and completely described, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The technical contents of the invention are described in detail below with reference to the accompanying drawings and specific embodiments.

Example 1

An embodiment of the present invention provides a human body posture migration method based on motion driving, as shown in fig. 2, the method includes the following steps:

human action video data is single motion video data, does not restrict specific motion kind, and the preferred definition more than 1080P of video guarantees that the human body occupies the video most and human body is complete, the screening is for deleting the incomplete video data of human body.

the method comprises the steps of carrying out human body detection on a target object through a human body key point detection model to obtain key point information, wherein in the embodiment, the key point detection is carried out through a model with higher precision, such as Densepose, Mediapipe and the like;

in order to accelerate the subsequent training speed, key point detection can be carried out on the data in advance, and the key point detection results are synchronously input during training; and the data can be subjected to preliminary screening and preprocessing by using key point detection, the data which cannot detect the human body key points or detect multi-person key points are deleted, then the data with the human body area occupying ratio being too small is cut according to the detected human body key points, and the position coordinates of the key points can be manually corrected if the human body key point model detection effect has deviation on individual data, so that the accuracy of the data is ensured.

key point coordinates k of source image_oThe coordinates of key points in the driving image are k_tThen by k_oTo k_tAffine transformation of (2):

wherein the content of the compound A is A,

for affine transformation parameters, a is a linear mapping matrix,

is a translation parameter. Solving the above matrix to obtain A,

each pair of keypoints has its corresponding set of affine transformation parameters.

in this embodiment, the motion estimation model is composed of a convolutional layer, a full link layer, an activation layer, a pooling layer, a normalization layer, and other neural network basic structures, and a concrete structure may be a UNet network structure or other Encoder-Decoder model structures.

The input of the motion estimation model is affine transformation parameters between a driving image and key points, and the final output is an optical flow graph L and a redrawing graph M after calculation of a convolution layer, a full connection layer, an activation layer, a pooling layer, a normalization layer and the like of a network; the light flow graph represents the transformation relationship between each pixel of the driving image to the source image, and the redraws represent the areas that need to be redrawn.

the action generation model adopts a countermeasure generation network and is composed of a super-resolution model, and the super-resolution model comprises an encoder E and a decoder G; wherein the input of the encoder E is a driving image, and the output is a hidden layer characteristic f of the driving image_ESplicing the light flow graph L with the output characteristics of the encoder E, multiplying the spliced light flow graph L with the redrawing graph M to obtain the input of a decoder G, and finally generating an image D after migration through the decoder_gThe formula is as follows:

D_g＝G(M⊙(E(S),L))

The discriminator network model is a two-class neural network model for judging the authenticity of the input image, in the embodiment, a VGG16 model is selected as the network model of the discriminator, and the loss function adopts a cross entropy loss function.

The specific formula is as follows:

L_D＝-ylog(D(x))-(1-y)log(1-D(x))

wherein x is an input image and y is an image label; if the image is an original image, y is 1; if x is the generated image of the motion generation model, y is 0.

Extracting hidden layer characteristics of the posture migration generated image and model hidden layer characteristics of a human body in the driving image through a human body recognition model; the human body recognition model may employ any human body detection model, such as the CMU human body detection model, in which the subject network employs a ResNet50 network;

calculating a characteristic difference obtained by correspondingly extracting the attitude migration generated image and the driving image and using the characteristic difference as a loss function of the perceptron; the specific formula is as follows:

L_J＝||J(D_g)-J(Q)||

wherein D is_gAn image is generated for pose migration and Q is the drive image.

The feature difference is the distance between the hidden feature vector of the last layer of the model and the distance between the posture migration generated image and the driving image input into the model.

S603: combining the discriminator loss function with the perceptron loss function to output a final overall loss function L. The combination of the perceptron loss function and the discriminator loss function is as follows:

L＝w₁L_J+w₂L_D

wherein w₁，w₂The weight coefficient of each loss function is manually set according to the situation.

And performing back propagation according to the obtained Loss function, optimizing the weight of the model parameters by using an SGD (generalized Gaussian distribution) method, and finishing the training of the model when the training reaches a set round or the Loss is reduced to a given threshold value.

Example 2

Embodiment 2 of the present invention provides a human body posture migration system based on motion driving, as shown in fig. 1, the system includes a data module, a motion estimation module, a motion generation module, and a loss function module;

The invention is not limited to the foregoing embodiments. The invention extends to any novel feature or any novel combination of features disclosed in this specification and any novel method or process steps or any novel combination of features disclosed.

Claims

1. A human body posture migration method based on motion driving is characterized by comprising the following steps:

2. The human body posture migration method according to claim 1, wherein the human body motion video data is single-person motion video data, and the screening is deleting incomplete video data of a human body.

3. The human body posture migration method according to claim 1, wherein step S2 is further performed by detecting human body key points, re-screening the picture, and deleting data in which human body key points cannot be detected or multi-person key points are detected.

4. The human body posture migration method according to claim 1, wherein in step S5, the action generation model adopts a confrontation generation network, and the posture migration generated image is obtained through the following specific process:

5. The human body posture migration method of claim 1, wherein in step S601, the discriminator network model adopts VGG16 model, and the loss function adopts cross entropy loss function, and the formula is as follows:

L_D＝-ylog(D(x))-(1-y)log(1-D(x))

wherein x is the input image and y is the image label.

6. The human body posture migration method of claim 5, wherein in step S602, the perceptron loss function L_IThe calculation process is as follows:

L_J＝||J(D_g)-J(Q)||

wherein D_gAn image is generated for pose migration and Q is the drive image.

7. The human pose migration method according to claim 6, wherein the feature difference is a distance between a pose migration generated image and a hidden feature vector of a last layer of a model input into the model.

8. The human body posture migration method according to claim 6, wherein in step S603, the combination of the sensor loss function and the discriminator loss function is as follows:

L＝w₁L_J+w₂L_D

wherein w₁，w₂A weight coefficient for each loss function.

9. A human body posture migration system is characterized by comprising a data module, an action estimation module, an action generation module and a loss function module;