CN114049652A - Human body posture migration method and system based on action driving - Google Patents

Human body posture migration method and system based on action driving Download PDF

Info

Publication number
CN114049652A
CN114049652A CN202111304351.8A CN202111304351A CN114049652A CN 114049652 A CN114049652 A CN 114049652A CN 202111304351 A CN202111304351 A CN 202111304351A CN 114049652 A CN114049652 A CN 114049652A
Authority
CN
China
Prior art keywords
image
human body
loss function
model
migration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111304351.8A
Other languages
Chinese (zh)
Inventor
许轶博
潘泽文
范宏伟
李佳斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Aitneng Electric Technology Co ltd
Original Assignee
Chengdu Aitneng Electric Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Aitneng Electric Technology Co ltd filed Critical Chengdu Aitneng Electric Technology Co ltd
Priority to CN202111304351.8A priority Critical patent/CN114049652A/en
Publication of CN114049652A publication Critical patent/CN114049652A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a human body posture migration method and system based on action driving. The method explicitly and independently detects the human body key points in the model training and deducing processes, increases the stability of human body posture migration, and simultaneously increases the perception loss function based on the target human body in the training process, thereby improving the definition of the output image of the model.

Description

Human body posture migration method and system based on action driving
Technical Field
The invention relates to the technical field of image synthesis processing, in particular to a human body posture migration method and system based on motion driving
Background
The human body posture migration is to give a target character image and a character motion video, generate a new character motion video, make the target character in the generated video the character in the given image, make the character motion in the generated video consistent with the character motion in the given video, and this process is called posture migration. With the development of self-media, gesture migration is more widely applied, and in the existing gesture migration, the key point information of an image is generally obtained first, the mapping relationship between a migrated object and a source object is obtained according to the key point information, and finally a migrated result is output through a model, for example, publication number CN111598977A, which is named as a patent application of a method and a system for transferring and animating expressions.
The following disadvantages still exist in the prior art:
1. the key points of the human body are obtained through training, a large amount of data is needed for training to obtain a stable result, and meanwhile when the human body part of the video is shielded, the detected key positions are unstable, so that the relative mapping relation between the key points can be influenced, the generated character actions are inconsistent, and especially when the size of the generated image is large, the character actions are more obvious.
2. In the prior art, reconstruction loss is generally adopted as a loss function, namely, the integral contrast of an image is adopted as the loss function; when the character moves, the integral definition of the character cannot be additionally enhanced in the training process, so that when the resolution of the output video is high, the character is fuzzy integrally, and the fuzzy of the four limbs and the limb edges of the character is obvious.
Disclosure of Invention
In order to solve the problems, the invention provides a human body posture migration method and a human body posture migration system based on action driving, provides a high-precision human body key point detection model, and explicitly separates a human body key point detection module in the model training and deduction processes to increase the stability of human body posture migration; meanwhile, a perception loss function based on a target human body is added in the training process, so that the model focuses on the definition of the human body after migration and reconstruction, the definition of characters in the image under high resolution is increased, and the migration effect is improved.
The invention provides a human body posture migration method based on action driving, which has the following specific technical scheme:
s1: acquiring human body action video data, extracting image frames of the video data to obtain a plurality of continuous pictures, and screening the extracted pictures to obtain a target picture;
s2: detecting key points of the human body on the target picture to obtain key point coordinates;
s3: randomly extracting two images from the target picture as a source image and a driving image respectively, and calculating a transformation relation between the driving image and the source image according to the obtained key point coordinates;
s4: inputting the transformation relation into an action estimation model, and outputting a corresponding light flow graph and a redrawing;
s5: inputting the light flow graph and the redrawing into an action generation model to obtain a posture migration generation image;
s6: and generating an image based on the driving image and the attitude migration, and calculating a loss function, wherein the specific process is as follows:
s601: calculating a discriminator loss function L by a discriminator network model DD
S602: calculating a perceptron loss function L through a human recognition modelI
S603: combining the discriminator loss function with the perceptron loss function to output a final overall loss function L.
Further, the human body motion video data are single-person motion video data, and the screening is to delete the incomplete video data of the human body.
Further, in step S2, the picture may be rescreened by detecting key points of the human body, and data that the key points of the human body cannot be detected or key points of multiple persons are detected may be deleted.
Further, in step S5, the motion generation model adopts a confrontation generation network, and the specific process of obtaining the pose migration generated image is as follows:
inputting a source image into the action generation model, obtaining a hidden layer feature map of the source image, and splicing the hidden layer feature map with an optical flow map;
and multiplying the obtained splicing result by the redrawing, inputting the multiplied output result into a decoder of the model, and outputting the attitude migration generated image.
Further, in step S601, the discriminator network model adopts a VGG16 model, and the loss function adopts a cross-entropy loss function, which is expressed as follows:
LD=-ylog(D(x))-(1-y)log(1-D(x))
wherein x is the input image and y is the image label.
Further, in step S602, the perceptron loss function LIThe calculation process is as follows:
extracting model hidden layer characteristics of a human body in the posture migration generation image and the driving image through a human body recognition model;
calculating the characteristic difference obtained by correspondingly extracting the attitude migration generated image and the driving image as a perceptron loss function, wherein the formula is as follows:
LJ=||J(Dg)-J(Q)||
wherein DgAn image is generated for pose migration and Q is the drive image.
Further, the feature difference is a distance between a pose migration generated image and a hidden feature vector of a last layer of the model, wherein the driving image is input into the model.
Further, in step S603, the combination of the sensor loss function and the discriminator loss function is as follows:
L=w1LJ+w2LD
wherein w1,w2A weight coefficient for each loss function.
The invention also provides a human body posture migration system, which comprises a data module, an action estimation module, an action generation module and a loss function module;
the data module is used for collecting human body action videos and randomly extracting image frames to obtain source images, driving image data and corresponding human body key point coordinates;
the motion estimation module is connected with the data module and used for receiving a source image, a driving image and human body key point coordinate data, and outputting an optical flow field and a redrawing;
the action generation module is connected with the action estimation module and the data module and is used for receiving the photo-flow graph, the redrawing graph and the source image, splicing the hidden layer feature graph of the source image with the photo-flow graph, multiplying the splicing result with the redrawing graph and finally outputting the attitude migration generation image;
the loss function module is connected with the action generation module and the data module and is used for receiving the driving image and the generated image, calculating the perceptron loss function and the discriminator loss function and combining the perceptron loss function and the discriminator loss function to output a total loss function.
The invention has the following beneficial effects:
1. the character posture migration video is obtained through the character action video driving source character images, the human key point coordinates are obtained by utilizing the high-precision key point human key point detection model based on the source images and the driving images, the stability of human posture migration is improved, and the data volume of the model learning key point information is reduced.
2. In the process of model training, a final loss function is formed by combining the perception loss function of a target human body and the loss function of the discriminator, so that the model focuses on human body information, and the definition of the human body after migration reconstruction is improved.
Drawings
FIG. 1 is a schematic view of the model structure of the present invention;
FIG. 2 is a schematic flow chart of the method of the present invention.
Detailed Description
In the following description, technical solutions in the embodiments of the present invention are clearly and completely described, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The technical contents of the invention are described in detail below with reference to the accompanying drawings and specific embodiments.
Example 1
An embodiment of the present invention provides a human body posture migration method based on motion driving, as shown in fig. 2, the method includes the following steps:
s1: acquiring human body action video data, extracting image frames of the video data to obtain a plurality of continuous pictures, and screening the extracted pictures to obtain a target picture;
human action video data is single motion video data, does not restrict specific motion kind, and the preferred definition more than 1080P of video guarantees that the human body occupies the video most and human body is complete, the screening is for deleting the incomplete video data of human body.
S2: detecting key points of the human body on the target picture to obtain key point coordinates;
the method comprises the steps of carrying out human body detection on a target object through a human body key point detection model to obtain key point information, wherein in the embodiment, the key point detection is carried out through a model with higher precision, such as Densepose, Mediapipe and the like;
in order to accelerate the subsequent training speed, key point detection can be carried out on the data in advance, and the key point detection results are synchronously input during training; and the data can be subjected to preliminary screening and preprocessing by using key point detection, the data which cannot detect the human body key points or detect multi-person key points are deleted, then the data with the human body area occupying ratio being too small is cut according to the detected human body key points, and the position coordinates of the key points can be manually corrected if the human body key point model detection effect has deviation on individual data, so that the accuracy of the data is ensured.
S3: randomly extracting two images from the target picture as a source image and a driving image respectively, and calculating a transformation relation between the driving image and the source image according to the obtained key point coordinates;
key point coordinates k of source imageoThe coordinates of key points in the driving image are ktThen by koTo ktAffine transformation of (2):
Figure BDA0003338983810000041
wherein the content of the compound A is A,
Figure BDA0003338983810000042
for affine transformation parameters, a is a linear mapping matrix,
Figure BDA0003338983810000043
is a translation parameter. Solving the above matrix to obtain A,
Figure BDA0003338983810000044
each pair of keypoints has its corresponding set of affine transformation parameters.
S4: inputting the transformation relation into an action estimation model, and outputting a corresponding light flow graph and a redrawing;
in this embodiment, the motion estimation model is composed of a convolutional layer, a full link layer, an activation layer, a pooling layer, a normalization layer, and other neural network basic structures, and a concrete structure may be a UNet network structure or other Encoder-Decoder model structures.
The input of the motion estimation model is affine transformation parameters between a driving image and key points, and the final output is an optical flow graph L and a redrawing graph M after calculation of a convolution layer, a full connection layer, an activation layer, a pooling layer, a normalization layer and the like of a network; the light flow graph represents the transformation relationship between each pixel of the driving image to the source image, and the redraws represent the areas that need to be redrawn.
S5: inputting the light flow graph and the redrawing into an action generation model to obtain a posture migration generation image;
the action generation model adopts a countermeasure generation network and is composed of a super-resolution model, and the super-resolution model comprises an encoder E and a decoder G; wherein the input of the encoder E is a driving image, and the output is a hidden layer characteristic f of the driving imageESplicing the light flow graph L with the output characteristics of the encoder E, multiplying the spliced light flow graph L with the redrawing graph M to obtain the input of a decoder G, and finally generating an image D after migration through the decodergThe formula is as follows:
Dg=G(M⊙(E(S),L))
s6: and generating an image based on the driving image and the attitude migration, and calculating a loss function, wherein the specific process is as follows:
s601: calculating a discriminator loss function L by a discriminator network model DD
The discriminator network model is a two-class neural network model for judging the authenticity of the input image, in the embodiment, a VGG16 model is selected as the network model of the discriminator, and the loss function adopts a cross entropy loss function.
The specific formula is as follows:
LD=-ylog(D(x))-(1-y)log(1-D(x))
wherein x is an input image and y is an image label; if the image is an original image, y is 1; if x is the generated image of the motion generation model, y is 0.
S602: calculating a perceptron loss function L through a human recognition modelI
Extracting hidden layer characteristics of the posture migration generated image and model hidden layer characteristics of a human body in the driving image through a human body recognition model; the human body recognition model may employ any human body detection model, such as the CMU human body detection model, in which the subject network employs a ResNet50 network;
calculating a characteristic difference obtained by correspondingly extracting the attitude migration generated image and the driving image and using the characteristic difference as a loss function of the perceptron; the specific formula is as follows:
LJ=||J(Dg)-J(Q)||
wherein D isgAn image is generated for pose migration and Q is the drive image.
The feature difference is the distance between the hidden feature vector of the last layer of the model and the distance between the posture migration generated image and the driving image input into the model.
S603: combining the discriminator loss function with the perceptron loss function to output a final overall loss function L. The combination of the perceptron loss function and the discriminator loss function is as follows:
L=w1LJ+w2LD
wherein w1,w2The weight coefficient of each loss function is manually set according to the situation.
And performing back propagation according to the obtained Loss function, optimizing the weight of the model parameters by using an SGD (generalized Gaussian distribution) method, and finishing the training of the model when the training reaches a set round or the Loss is reduced to a given threshold value.
Example 2
Embodiment 2 of the present invention provides a human body posture migration system based on motion driving, as shown in fig. 1, the system includes a data module, a motion estimation module, a motion generation module, and a loss function module;
the data module is used for collecting human body action videos and randomly extracting image frames to obtain source images, driving image data and corresponding human body key point coordinates;
the motion estimation module is connected with the data module and used for receiving a source image, a driving image and human body key point coordinate data, and outputting an optical flow field and a redrawing;
the action generation module is connected with the action estimation module and the data module and is used for receiving the photo-flow graph, the redrawing graph and the source image, splicing the hidden layer feature graph of the source image with the photo-flow graph, multiplying the splicing result with the redrawing graph and finally outputting the attitude migration generation image;
the loss function module is connected with the action generation module and the data module and is used for receiving the driving image and the generated image, calculating the perceptron loss function and the discriminator loss function and combining the perceptron loss function and the discriminator loss function to output a total loss function.
The invention is not limited to the foregoing embodiments. The invention extends to any novel feature or any novel combination of features disclosed in this specification and any novel method or process steps or any novel combination of features disclosed.

Claims (9)

1. A human body posture migration method based on motion driving is characterized by comprising the following steps:
s1: acquiring human body action video data, extracting image frames of the video data to obtain a plurality of continuous pictures, and screening the extracted pictures to obtain a target picture;
s2: detecting key points of the human body on the target picture to obtain key point coordinates;
s3: randomly extracting two images from the target picture as a source image and a driving image respectively, and calculating a transformation relation between the driving image and the source image according to the obtained key point coordinates;
s4: inputting the transformation relation into an action estimation model, and outputting a corresponding light flow graph and a redrawing;
s5: inputting the light flow graph and the redrawing into an action generation model to obtain a posture migration generation image;
s6: and generating an image based on the driving image and the attitude migration, and calculating a loss function, wherein the specific process is as follows:
s601: calculating a discriminator loss function L by a discriminator network model DD
S602: calculating a perceptron loss function L through a human recognition modelI
S603: combining the discriminator loss function with the perceptron loss function to output a final overall loss function L.
2. The human body posture migration method according to claim 1, wherein the human body motion video data is single-person motion video data, and the screening is deleting incomplete video data of a human body.
3. The human body posture migration method according to claim 1, wherein step S2 is further performed by detecting human body key points, re-screening the picture, and deleting data in which human body key points cannot be detected or multi-person key points are detected.
4. The human body posture migration method according to claim 1, wherein in step S5, the action generation model adopts a confrontation generation network, and the posture migration generated image is obtained through the following specific process:
inputting a source image into the action generation model, obtaining a hidden layer feature map of the source image, and splicing the hidden layer feature map with an optical flow map;
and multiplying the obtained splicing result by the redrawing, inputting the multiplied output result into a decoder of the model, and outputting the attitude migration generated image.
5. The human body posture migration method of claim 1, wherein in step S601, the discriminator network model adopts VGG16 model, and the loss function adopts cross entropy loss function, and the formula is as follows:
LD=-ylog(D(x))-(1-y)log(1-D(x))
wherein x is the input image and y is the image label.
6. The human body posture migration method of claim 5, wherein in step S602, the perceptron loss function LIThe calculation process is as follows:
extracting model hidden layer characteristics of a human body in the posture migration generation image and the driving image through a human body recognition model;
calculating the characteristic difference obtained by correspondingly extracting the attitude migration generated image and the driving image as a perceptron loss function, wherein the formula is as follows:
LJ=||J(Dg)-J(Q)||
wherein DgAn image is generated for pose migration and Q is the drive image.
7. The human pose migration method according to claim 6, wherein the feature difference is a distance between a pose migration generated image and a hidden feature vector of a last layer of a model input into the model.
8. The human body posture migration method according to claim 6, wherein in step S603, the combination of the sensor loss function and the discriminator loss function is as follows:
L=w1LJ+w2LD
wherein w1,w2A weight coefficient for each loss function.
9. A human body posture migration system is characterized by comprising a data module, an action estimation module, an action generation module and a loss function module;
the data module is used for collecting human body action videos and randomly extracting image frames to obtain source images, driving image data and corresponding human body key point coordinates;
the motion estimation module is connected with the data module and used for receiving a source image, a driving image and human body key point coordinate data, and outputting an optical flow field and a redrawing;
the action generation module is connected with the action estimation module and the data module and is used for receiving the photo-flow graph, the redrawing graph and the source image, splicing the hidden layer feature graph of the source image with the photo-flow graph, multiplying the splicing result with the redrawing graph and finally outputting the attitude migration generation image;
the loss function module is connected with the action generation module and the data module and is used for receiving the driving image and the generated image, calculating the perceptron loss function and the discriminator loss function and combining the perceptron loss function and the discriminator loss function to output a total loss function.
CN202111304351.8A 2021-11-05 2021-11-05 Human body posture migration method and system based on action driving Pending CN114049652A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111304351.8A CN114049652A (en) 2021-11-05 2021-11-05 Human body posture migration method and system based on action driving

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111304351.8A CN114049652A (en) 2021-11-05 2021-11-05 Human body posture migration method and system based on action driving

Publications (1)

Publication Number Publication Date
CN114049652A true CN114049652A (en) 2022-02-15

Family

ID=80207224

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111304351.8A Pending CN114049652A (en) 2021-11-05 2021-11-05 Human body posture migration method and system based on action driving

Country Status (1)

Country Link
CN (1) CN114049652A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114783039A (en) * 2022-06-22 2022-07-22 南京信息工程大学 Motion migration method driven by 3D human body model
CN114821811A (en) * 2022-06-21 2022-07-29 平安科技(深圳)有限公司 Method and device for generating person composite image, computer device and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114821811A (en) * 2022-06-21 2022-07-29 平安科技(深圳)有限公司 Method and device for generating person composite image, computer device and storage medium
CN114821811B (en) * 2022-06-21 2022-09-30 平安科技(深圳)有限公司 Method and device for generating person composite image, computer device and storage medium
CN114783039A (en) * 2022-06-22 2022-07-22 南京信息工程大学 Motion migration method driven by 3D human body model

Similar Documents

Publication Publication Date Title
CN110276316B (en) Human body key point detection method based on deep learning
CN112766160B (en) Face replacement method based on multi-stage attribute encoder and attention mechanism
CN109508654A (en) Merge the human face analysis method and system of multitask and multiple dimensioned convolutional neural networks
CN112733797B (en) Method, device and equipment for correcting sight of face image and storage medium
CN113158862B (en) Multitasking-based lightweight real-time face detection method
CN112926396A (en) Action identification method based on double-current convolution attention
CN114049652A (en) Human body posture migration method and system based on action driving
CN114049381A (en) Twin cross target tracking method fusing multilayer semantic information
CN109977834B (en) Method and device for segmenting human hand and interactive object from depth image
CN111401293A (en) Gesture recognition method based on Head lightweight Mask scanning R-CNN
US20220351547A1 (en) Gesture analysis method and device, and computer-readable storage medium
CN114724155A (en) Scene text detection method, system and equipment based on deep convolutional neural network
CN113570658A (en) Monocular video depth estimation method based on depth convolutional network
CN115035171A (en) Self-supervision monocular depth estimation method based on self-attention-guidance feature fusion
CN113378812A (en) Digital dial plate identification method based on Mask R-CNN and CRNN
CN113610046A (en) Behavior identification method based on depth video linkage characteristics
CN115471611A (en) Method for improving visual effect of 3DMM face model
CN113076905B (en) Emotion recognition method based on context interaction relation
CN111274901B (en) Gesture depth image continuous detection method based on depth gating recursion unit
CN115496859A (en) Three-dimensional scene motion trend estimation method based on scattered point cloud cross attention learning
CN113838102B (en) Optical flow determining method and system based on anisotropic dense convolution
CN113255514B (en) Behavior identification method based on local scene perception graph convolutional network
CN115346259A (en) Multi-granularity academic emotion recognition method combined with context information
CN112164078B (en) RGB-D multi-scale semantic segmentation method based on encoder-decoder
CN111611997A (en) Cartoon customized image motion video generation method based on human body action migration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination