CN114419277A - Image-based step-by-step generation type human body reconstruction method and device - Google Patents

Image-based step-by-step generation type human body reconstruction method and device Download PDF

Info

Publication number
CN114419277A
CN114419277A CN202210059026.8A CN202210059026A CN114419277A CN 114419277 A CN114419277 A CN 114419277A CN 202210059026 A CN202210059026 A CN 202210059026A CN 114419277 A CN114419277 A CN 114419277A
Authority
CN
China
Prior art keywords
human body
image
heat map
model
dimensions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210059026.8A
Other languages
Chinese (zh)
Inventor
邝嘉健
郑伟诗
高义朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202210059026.8A priority Critical patent/CN114419277A/en
Publication of CN114419277A publication Critical patent/CN114419277A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a human body reconstruction method and a device based on a step-by-step generation formula of an image, wherein the method comprises the following steps: extracting image features of a given human body image, decoding to generate heat maps of dimensions in x, y and z directions, separating again after connection coding, and performing heat map integral regression to obtain coordinates of human body joint points; the heat map distribution of the human body joint points is subjected to information interaction between the human body joint points and vertices of a human body mesh model through a transfomer encoder, sampling on the vertices of the human body mesh model and batch regularization, and then a final human body mesh model is generated step by step; and inputting the vertex coordinates of the human body mesh model into a general SMPL model regression device, and outputting the human body joint point coordinates corresponding to the human body model as reconstruction constraints of the human body mesh model. On the basis of human body three-dimensional posture estimation, the invention introduces an attention mechanism to optimize the distribution of the heat maps of different directions and gradually generates a human body grid model by adopting a thought from coarse to fine.

Description

Image-based step-by-step generation type human body reconstruction method and device
Technical Field
The invention relates to the technical field of image processing, in particular to a step-by-step generation type human body reconstruction method and device based on images.
Background
The existing human body reconstruction technology based on deep learning is mainly divided into two types: the first type is three-dimensional human body reconstruction based on a parameterized model, human body model parameters are estimated through a neural network, and a three-dimensional human body grid model is directly generated by using the parameterized model; the second type of three-dimensional human body reconstruction does not utilize a parameterized model to generate a human body mesh model, but directly returns the coordinates of the vertexes of the human body three-dimensional model based on image characteristic information.
The method for directly regressing the coordinates of the vertexes of the human body three-dimensional model based on the image characteristic information generally comprises two realization methods: firstly, a neural network is utilized to estimate three-dimensional human body posture and heat map distribution of a three-dimensional human body mesh model in x, y and z axes, and then a heat map integral mode is utilized to regress human body joint point coordinates and human body mesh model vertex coordinates. And secondly, gradually returning the three-dimensional grid coordinates of the human body model through a transformer and a gradual dimension reduction thought by means of a human body model template.
In the prior art, the first major disadvantage is that the estimation of each direction dimension heat map is relatively independent, and the relevance between different direction dimension heat map information is lost; the second main disadvantage is that a human body model template is needed, the model calculation amount is large, and the training time is long.
Disclosure of Invention
The invention mainly aims to overcome the defects of the prior art and provide a step-by-step generation type human body reconstruction method and device based on images, an attention mechanism is introduced to learn the distribution relation of heat maps with dimensions in different directions, and three-dimensional human body joint point estimation and human body mesh model reconstruction are optimized through the information interaction of the heat maps with the dimensions in different directions; meanwhile, based on the heat map information of the joint points of the human body, the vertex coordinates of the human body mesh model are gradually regressed from coarse to fine, so that the model does not depend on a human body model template, and the calculated amount of the model and the training time are reduced.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention provides a human body reconstruction method based on a step-by-step generation formula of an image, which comprises the following steps:
extracting image features of a given human body image, decoding the image features to generate heat maps of dimensions in the x direction, the y direction and the z direction, connecting the heat map distribution of the dimensions in the x direction, the y direction and the z direction, separating the heat map distribution of the dimensions in the x direction, the y direction and the z direction after coding, and performing heat map integration to regress coordinates of human body joint points;
the heat map distribution of the human body joint points is subjected to information interaction between the human body joint points and vertices of a human body mesh model through a transfomer encoder, sampling on the vertices of the human body mesh model and batch regularization, and then a final human body mesh model is generated step by step;
and inputting the coordinates of the human body joint points into a human body mesh model, and outputting the coordinates of the human body joint points corresponding to the human body model after passing through a universal SMPL model regressor to serve as reconstruction constraints of the human body mesh model.
As a preferred technical solution, the extracting of the image feature of the given human body image specifically includes:
giving a human body image, cutting the human body image based on a labeling frame or a detection frame, removing background influence, keeping a pure human body image, and adjusting the size of the human body image to enable the human body image to be matched with subsequent neural network processing;
inputting the processed image into a coder for feature extraction to obtain human body image features FP,FPThe characteristic dimension is c multiplied by h multiplied by w;
for the obtained human body image characteristics, firstly utilizing inverse convolution operation in the x-axis and y-axis directions
Figure BDA0003477483590000021
Will feature chart FpAscending the dimension to c' × 8h × 8 w. Then, in the x-axis direction, the y-axis dimension is averaged avgyReuse of one-dimensional convolution
Figure BDA0003477483590000022
Obtaining a heat map distribution P in the x-axis directionH,xE is J multiplied by 8 h; similarly, in the y-axis direction, the averaging operation avg is performed on the x-axis dimensionxReuse of one-dimensional convolution
Figure BDA0003477483590000023
Obtaining a heat map distribution P in the y-axis directionH,y∈J×8w。;
Figure BDA0003477483590000024
Figure BDA0003477483590000025
In the z-axis direction, the averaging operation avg is carried out on the x-axis dimension and the y-axis dimensionx,yBy means of a one-dimensional convolution fpAnd characteristic deforming operation
Figure BDA0003477483590000026
Converting the characteristic dimension into c' x D, and finally, utilizing one-dimensional convolution
Figure BDA0003477483590000027
Obtaining a heat map distribution P in the z-axis directionH,z∈J×D。
Figure BDA0003477483590000028
As a preferred technical solution, the heat map distributions in the dimensions of x, y, and z are connected, and the heat map distributions in the dimensions of x, y, and z are separated after being encoded by a transform encoder, specifically:
heat map distribution P for three dimensions of x, y and zH,x,PH,y,PH,z∈RJ×64Fusing in the last dimension to obtain fused feature PH=[PH,x,PH,y,PH,z]∈RJ×192The fused features comprise heat map distribution information of three direction dimensions, and then the fused features are used as the input of a transform encoder;
post-fusion feature PHInputting the heat map information into the N layers of attention modules to carry out heat map information interaction between different direction dimensions and different joint points; wherein, each layer of attention module carries out four operations of multi-head attention, residual connection and regularization, feedforward network processing, residual connection and regularization in turn, and finally outputs a heat map distribution P'H∈RJ×192
Distributing the outputted heatmap to P'HRespectively passing through independent characteristic full connection layers fcx、fcy、fczPerforming feature mapping and matching with the original heat map distribution PH,x、PH,y、PH,zSummed and re-separated into a heat map distribution P 'of three dimensions x, y and z'H,x,P′H,y,P′H,z
P′(H,x)=PH,x+fcx(P′H)
P′(H,y)=PH,y+fcy(P′H)
P′(H,z)=PH,z+fcz(P′H)
For the heat map distribution of three directional dimensions of each joint point, regression is carried out on the coordinate points by using soft-argmax to obtain P'C,x,P′C,y,P′C,z∈RJ×1Then, a connecting operation is performed to obtain final human body joint point coordinates P'C=[P′C,x,P′C,y,P′C,z]∈RJ×3;。
As a preferred technical scheme, the soft-argmax is defined as follows:
Figure BDA0003477483590000031
as a preferred technical scheme, the human body joint point and human body mesh model vertex information interaction specifically includes:
given heat map feature FH=[P′H,x,P′H,y,P′H,z]∈RJ×192Obtaining features F by position embeddingembed= FH+ PE, location embedding PE is defined as follows:
Figure BDA0003477483590000032
pos is the position, i is the characteristic dimension subscript, dmodelIs a characteristic dimension;
post position embedding feature FembedInputting into a Transformer encoder, wherein the Transformer encoder comprises N blocks, and each block comprises a multi-head attention module and a feedforward neural network; in each block, FembedFirstly, calculating normalized attention weight by a multi-head attention module, then carrying out feature transformation by a feedforward neural network, and finally outputting after N blocks
Figure BDA0003477483590000037
Namely the vertex heat map distribution of the human body model with V vertexes.
As a preferred technical solution, the sampling at the vertex of the human body model specifically includes:
outputting processed by a Transformer encoder
Figure BDA0003477483590000033
Input to the 1 × 1 convolutional layer for up-sampling operation, and output as
Figure BDA0003477483590000034
Namely, after 1 × 1 convolution operation, the number of the existing model vertices is doubled, thereby realizing the refinement of the model.
As a preferred technical solution, the inputting of the human body joint point coordinates into the human body mesh model, and after passing through the general SMPL model regressor, outputting the human body joint point coordinates corresponding to the human body model to complete the human body reconstruction specifically are:
Figure BDA0003477483590000035
wherein, SMPL.J _ regressor belongs to 24 x 6890, is a joint regression matrix of the SMPL model, M'cE 6890 x 3 is the human body mesh model estimated in the above method,
Figure BDA0003477483590000036
and obtaining the coordinates of the human body joint points by regression.
The invention provides a human body reconstruction system based on a step-by-step generation formula of an image, which is applied to the human body reconstruction method based on the step-by-step generation formula of the image and comprises a human body posture estimation module, a human body network generation module and a human body correlation node regression module;
the human body posture estimation module is used for extracting image features of a given human body image, decoding the image features to generate heat maps of dimensions in the x direction, the y direction and the z direction, connecting the heat map distribution of the dimensions in the x direction, the y direction and the z direction, separating the heat map distribution of the dimensions in the x direction, the y direction and the z direction after coding, and performing heat map integral regression to obtain coordinates of human body joint points;
the heat map distribution of the human body joint points is subjected to information interaction between the human body joint points and vertices of a human body mesh model through a transfomer encoder, sampling on the vertices of the human body mesh model and batch regularization, and then a final human body mesh model is generated step by step; (ii) a
And the human body joint point regression module is used for inputting the human body joint point coordinates into the human body mesh model, outputting the human body joint point coordinates corresponding to the human body model after passing through the general SMPL model regression device, and using the human body joint point coordinates as reconstruction constraints of the human body mesh model.
Yet another aspect of the present invention provides an electronic device, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores computer program instructions executable by the at least one processor to cause the at least one processor to perform the image-based step-wise generation human reconstruction method.
Still another aspect of the present invention provides a computer-readable storage medium storing a program which, when executed by a processor, implements the image-based progressive generation type human body reconstruction method.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the invention introduces an attention mechanism, so that the human posture estimation and the form reconstruction can better utilize the correlation information of the distribution of the heat maps in different direction dimensions, thereby realizing the fine adjustment of the distribution of the heat maps in different direction dimensions. The existing scheme directly utilizes the single-direction dimension heat map to estimate the human body posture and carry out body reconstruction, and interaction of dimension information in different directions is lost.
2. The invention adopts a human body reconstruction scheme of a gradual generation type from coarse to fine, reduces the dependence on a human body model template and reduces the time required by training; the existing method depends on a human body model template, and has large calculated amount and long training time.
3. Compared with a scheme generated directly, the human body reconstruction scheme which is generated step by step from coarse to fine can reduce the video memory requirement required by training.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flowchart of a human body reconstruction method based on a step-by-step generation of images according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of human body posture estimation according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating heat map information interaction between different orientation dimensions and different joints according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of network imaging of a human body according to an embodiment of the invention;
FIG. 5 is a schematic structural diagram of an image-based human body reconstruction system according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the technical solution of the present invention better understood by those skilled in the art, the technical solution of the present invention will be clearly and completely described below with reference to the embodiments of the present invention and the accompanying drawings, it should be understood that the drawings are for illustrative purposes only and are not to be construed as limiting the patent. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
As shown in fig. 1, the human body reconstruction method based on the step-by-step image generation formula of the present embodiment mainly includes the following three steps: (1) estimating the posture of the human body; (2) generating a human body grid; (3) human body joint point regression; on one hand, an attention mechanism is introduced to learn the distribution relation of the heat maps in different directions, and three-dimensional human body joint point estimation and human body mesh model reconstruction are optimized through the information interaction of the heat maps in different directions; on the other hand, based on the heat map information of the human body joint points, the vertex coordinates of the human body mesh model are gradually regressed in a coarse-to-fine mode, so that the model is independent of a human body model template, and the calculated amount of the model and the training time are reduced.
The image-based human body reconstruction method based on the step-by-step generation formula will be detailed in the following according to a specific workflow:
(1) estimating the posture of the human body;
as shown in fig. 2, a specific process of human body posture estimation is that the human body posture estimation module is implemented based on an encoder, a decoder and a transform encoder. A human body image I is given as input, the I carries out feature extraction through an encoder to obtain image features F, the image features F are decoded through a decoder to generate heat map distribution of dimensions in the x direction, the y direction and the z direction, then the heat map distribution of the dimensions in the x direction, the y direction and the z direction is connected and input into an N-layer transform encoder, the heat map distribution of the dimensions in the x direction, the y direction and the z direction is separated after output, and then the coordinates of the joint points of the human body are regressed through heat map integration.
Further, the specific process of the human body posture estimation is as follows:
(1.1) extracting human body image features;
(1.1.1) giving a human body image I, cutting the image based on the labeling frame or the detection frame, removing background influence, reserving a pure human body image, and adjusting the size of the image to be matched with the subsequent neural network processing.
(1.1.2) inputting the processed image into an encoder for feature extraction to obtain human body image features FP。FPThe characteristic dimension is x × h × w.
For the obtained human body image characteristics, firstly utilizing inverse convolution operation in the x-axis and y-axis directions
Figure BDA0003477483590000061
Will feature chart FpAscending the dimension to c' × 8h × 8 w. Then, in the x-axis direction, the y-axis dimension is subjected to an averaging operation avgyReuse of one-dimensional convolution
Figure BDA0003477483590000062
Obtaining a heat map distribution P in the x-axis directionH,xE is J multiplied by 8 h; similarly, in the y-axis direction, the averaging operation avg is performed on the x-axis dimensionxReuse of one-dimensional convolution
Figure BDA0003477483590000063
Obtaining a heat map distribution P in the y-axis directionH,y∈J×8w。
Figure BDA0003477483590000064
Figure BDA0003477483590000065
In the z-axis direction, the averaging operation avg is carried out on the x-axis dimension and the y-axis dimensionx,yBy means of a one-dimensional convolution fpAnd characteristic deforming operation
Figure BDA0003477483590000066
Converting the characteristic dimension into c' x D, and finally, utilizing one-dimensional convolution
Figure BDA0003477483590000067
Obtaining a heat map distribution P in the z-axis directionH,z∈J×D。
Figure BDA0003477483590000068
In the specific implementation phase 8 h-8 w-D-64 of the present embodiment.
Furthermore, when feature extraction is performed, a residual network ResNet50 can be selected, and ResNet50 has two basic blocks, namely Conv Block and Identity Block, wherein the input and output dimensions of Conv Block are different, so that Conv Block cannot be connected in series, and the function of the residual network ResNet50 is to change the dimensions of the network; the input dimension and the output dimension of the Identity Block are the same and can be connected in series for deepening the network. In the embodiment, the residual error network ResNet50 can be used for well extracting human body characteristics; of course, the feature extraction in the present application is not limited to the residual error network ResNet50, and other residual error networks that can implement the technical solution of the present application are all applicable to the present application and will not be described herein again.
(1.2) generating x, y and z direction dimension heat maps;
(1.2.1) inputting the extracted human body image characteristics F into a decoder, and outputting a multi-dimensional heat map distribution of three dimensions of x, y and z, in the embodiment, outputting a 64-dimensional heat map distribution P of J joint pointsH,x,PH,y,PH,z∈RJ×64
(1.3) exchanging heat map information of different direction dimensions;
(1.3.1) Heat map distribution P for three dimensions of x, y, zH,x,PH,y,PH,z∈RJ×64Performing connection (concatee) fusion in the last dimension to obtain a fused feature PH=[PH,x,PH,y,PH,z]∈RJ×192The features will contain heat map distribution information of three directional dimensions, and then the fused features are used as input of a transform encoder.
(1.3.2) post-fusion feature PHInputting the heat map information into an N-layer attention module (such as figure 3) to carry out heat map information interaction in different direction dimensions and between different joint points. Wherein, each layer of attention module carries out four operations of multi-head attention, residual connection and regularization, feedforward network processing, residual connection and regularization and the like in sequence, and finally outputs a heat map distribution P'H∈RJ×192
(1.3.3) distribution of outputted heatmap P'HRespectively passing through independent characteristic full connection layers fcx、fcy、fczPerforming feature mapping and matching with the original heat map distribution PH,x、PH,y、PH,zAre summed, i.e. can be re-separated into a heat map distribution P 'of three dimensions x, y and z'H,x,P′H,y,P′H,z
P′(H,x)=PH,x+fcx(P′H)
P′(H,y)=PH,y+fcy(P′H)
P′(H,z)=PH,z+fcz(P′H)
(1.3.4) for the heatmap distribution of three directional dimensions for each joint point, regression was performed on the coordinate points using soft-argmax to obtain P'C,x,P′C,y,P′C,z∈RJ×1Then, a connecting operation is performed to obtain final human body joint point coordinates P'C=[P′C,x,P′C,y,P′C,z]∈RJ×3
Further, in the present embodiment, soft-argmax is defined as follows:
Figure BDA0003477483590000071
(1.3.5) the human body posture estimation module adopts joint loss and bone loss as training indexes, and the joint loss and the bone loss are respectively defined as follows: l isjoint=||P′C-PC||1
Figure BDA0003477483590000072
Wherein, PCIs true data, P'CData is predicted for the network.
(2) Generating a human body mesh model;
as shown in FIG. 4, for intermediate output P 'given in step (1)'H=[P′H,x,P′H,y,P′H,z]∈RJ×192,P′HThe following steps are carried out for a plurality of times: firstly, inputting heat map characteristics into a Transformer encoder for information interaction; inputting the heat map characteristics into a 1 multiplied by 1 convolution network to perform grid vertex up-sampling, and increasing the number of vertices of a grid model; carrying out batch regularization on the heat map characteristics; and gradually generating a final human body mesh model.
The specific process of the human body mesh model is as follows:
here, assume that the heat map feature input for a single layer operation is FHE.g. B x V x 192, wherein B is the batch processing number, V is the sum of the current joint point and the model vertex number, and 192 is the current feature dimension.
(2.1) interacting joint points and model vertex information;
(2.1.1) given heat map feature FH=[P′H,x,P′H,y,P′H,z]∈RJ×192Obtaining features F by position embeddingembed=FH+ PE, location embedding PE is defined as follows:
Figure BDA0003477483590000073
Figure BDA0003477483590000081
pos is the position, i is the characteristic dimension subscript, dmodelIs the feature dimension (take value 256).
Post position embedding feature FembedIn the transform encoder, the transform encoder comprises N blocks, each block comprising a multi-headed attention module and a feedforward neural network (see fig. 3). In each block, FembedFirstly, the normalized attention weight is calculated by a multi-head attention module, then the feature transformation is carried out by a feedforward neural network, and finally, the feature transformation is output after N blocks
Figure BDA0003477483590000082
Namely the vertex heat map distribution of the human body model with V vertexes.
(2.2) model vertex upsampling;
(2.2.1) output processed by the Transformer encoder
Figure BDA0003477483590000083
Input to the 1 × 1 convolutional layer for up-sampling operation, and output as
Figure BDA0003477483590000084
Namely, after 1 × 1 convolution operation, the number of the existing model vertices is doubled, thereby realizing the refinement of the model.
(2.3) batch regularization;
(2.3.1) vertex heatmap after sampling
Figure BDA0003477483590000085
Batch regularization is performed.
In conclusion, after the operations of the steps (2.1) to (2.3), the number of the top points of the human body model is doubled, and the construction process of the model from coarse to fine is realized. Finally, similarly to the human body posture estimation module, distributing F 'to the output heat map'HRe-separation into heatmap distribution F 'of three directional dimensions x, y, z'H,x,F′H,y,F′H,zThe coordinate points were regressed using soft-argmax. Wherein the vertex losses are introduced as optimization objectives. L isvertex=||V′C-VC||1,VCIs true data, V'CData is predicted for the network.
(3) Human body joint point regression;
with human body mesh model V'CAs input, after passing through a general SMPL model regressor, the human body joint point coordinates corresponding to the human body model can be output
Figure BDA0003477483590000086
Joint loss is then introduced to constrain the generation of the mannequin.
The method specifically comprises the following steps:
Figure BDA0003477483590000087
wherein, SMPL.J _ regressor belongs to 24 x 6890, is a joint regression matrix of the SMPL model, M'cE 6890 x 3 is the human body mesh model estimated in the above method,
Figure BDA0003477483590000088
and obtaining the coordinates of the human body joint points by regression.
Further, in the above-mentioned case,
Figure BDA0003477483590000089
the human body posture estimation and body reconstruction result of the technical scheme of the invention is more accurate, and the attention mechanism is introduced to process the heat map distribution of different direction dimensions, so that the heat map information of different direction dimensions can be more effectively utilized to guide the human body posture estimation and body reconstruction. And the model training time is shorter, compared with a method depending on a human body model template, the method adopts a scheme of gradually generating from coarse to fine, and has less calculation amount compared with a method of directly adjusting and optimizing based on the human body model template, thereby shortening the model training time.
It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present invention is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present invention.
Based on the same idea as the human body reconstruction method based on the image step-by-step generation formula in the embodiment, the invention also provides a human body reconstruction method system based on the image step-by-step generation formula, which can be used for executing the human body reconstruction method based on the image step-by-step generation formula. For convenience of explanation, the schematic structural diagram of the embodiment of the system for human body reconstruction based on image step-by-step generation only shows the parts related to the embodiment of the present invention, and those skilled in the art will understand that the illustrated structure does not constitute a limitation to the apparatus, and may include more or less components than those illustrated, or combine some components, or arrange different components.
As shown in fig. 6, in another embodiment of the present application, there is provided an image-based step-by-step generation human body reconstruction method system 100, which includes a human body posture estimation module 101, a human body network generation module 102, and a human body joint point regression module 103.
Further, the human body posture estimation module 101 is configured to extract image features of a given human body image, decode the image features to generate heat maps of dimensions in three directions x, y, and z, connect the heat map distributions of the dimensions in the three directions x, y, and z, separate the heat map distributions of the dimensions in the three directions x, y, and z after encoding, and perform heat map integration to obtain coordinates of a human body joint point;
further, the human body network generating module 102 is configured to perform human body joint point and human body mesh model vertex information interaction on human body joint point heat map distribution through a transfomer encoder, sample on human body mesh model vertices, and generate a final human body mesh model step by step after batch regularization;
further, the human joint point regression module 103 is configured to input the coordinates of the human joint points into the human mesh model, and output the coordinates of the human joint points corresponding to the human model after passing through the general SMPL model regressor, as reconstruction constraints of the human mesh model.
The three modules of the system are trained in two stages, wherein the first stage trains human body posture estimation module parameters, an input data set is given, and the human body joint point coordinate P 'is obtained through the human body posture estimation process'CAnd joint loss and bone loss are used as training indexes and are respectively defined as follows: l isjoint=||P′C-PC||1
Figure BDA0003477483590000091
Figure BDA0003477483590000092
Wherein, PCIs true data, P'CData is predicted for the network. Data flow in this stage does not pass through the human body grid generation module and the human body joint point regression module. Training a human body grid generating module and a human body joint point regression module at the second stage, fixing parameters of a human body posture estimating module at the moment, inputting distribution of heat maps in the x direction, the y direction and the z direction estimated by the human body posture estimating module into the human body grid generating module, and estimating a human body grid model M'C,M′CThe joint coordinates after regression are output by a joint regression module
Figure BDA0003477483590000093
In this stage, vertex loss and joint regression loss are used as training indexes, and are respectively defined as follows: l isvertex= ||M′C-MC||1And
Figure BDA0003477483590000094
it should be noted that, the image-based human body reconstruction method system of the present invention corresponds to the image-based human body reconstruction method of the present invention, and the technical features and the advantages thereof described in the above embodiment of the image-based human body reconstruction method of the present invention are all applicable to the embodiment of the image-based human body reconstruction method of the step-by-step generation formula.
In addition, in the implementation of the human body reconstruction method system based on image step-by-step generation of the above embodiment, the logical division of the program modules is only an example, and in practical applications, the above function allocation may be performed by different program modules according to needs, for example, due to configuration requirements of corresponding hardware or due to implementation convenience of software, that is, the internal structure of the human body reconstruction method system based on image step-by-step generation is divided into different program modules to perform all or part of the above described functions.
As shown in fig. 6, in an embodiment, an electronic device for implementing a human body reconstruction method based on image stepwise generation is provided, and the electronic device 200 may include a first processor 201, a first memory 202 and a bus, and may further include a computer program, such as a human body reconstruction program 203 based on image stepwise generation, stored in the first memory 202 and executable on the first processor 201.
The first memory 202 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The first memory 202 may in some embodiments be an internal storage unit of the electronic device 200, such as a removable hard disk of the electronic device 200. The first memory 202 may also be an external storage device of the electronic device 200 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 200. Further, the first memory 202 may also include both an internal storage unit and an external storage device of the electronic device 200. The first memory 202 may be used not only to store application software installed in the electronic device 200 and various types of data, such as codes of the human body reconstruction program 203 of the image-based step-by-step generation type, etc., but also to temporarily store data that has been output or is to be output.
The first processor 201 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same function or different functions, and includes one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The first processor 201 is a Control Unit (Control Unit) of the electronic device, connects various components of the whole electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 200 by running or executing programs or modules (e.g., federal learning defense programs, etc.) stored in the first memory 202 and calling data stored in the first memory 202.
Fig. 6 shows only an electronic device having components, and those skilled in the art will appreciate that the structure shown in fig. 6 does not constitute a limitation of the electronic device 200, and may include fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
The human body reconstruction program 203 based on image stepwise generation stored in the first memory 202 of the electronic device 200 is a combination of a plurality of instructions, which when executed in the first processor 201, can realize:
extracting image features of a given human body image, decoding the image features to generate heat maps of dimensions in the x direction, the y direction and the z direction, connecting the heat map distribution of the dimensions in the x direction, the y direction and the z direction, separating the heat map distribution of the dimensions in the x direction, the y direction and the z direction after coding, and performing heat map integration to regress coordinates of human body joint points;
gradually generating a final human body mesh model after the heat map distribution is subjected to information interaction of joint points and model vertexes, human body model vertex sampling and batch regularization;
and inputting the coordinates of the human body joint points into a human body grid model, outputting the coordinates of the human body joint points corresponding to the human body model after passing through a universal SMPL model regression device, and finishing human body reconstruction.
Further, the modules/units integrated with the electronic device 200, if implemented in the form of software functional units and sold or used as separate products, may be stored in a non-volatile computer-readable storage medium. The computer readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program that can be stored in a non-volatile computer-readable storage medium and can include the processes of the embodiments of the methods described above when executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, however, as long as there is no contradiction between the combinations of the technical features, the technical features should be considered as the scope of the present description.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (10)

1. A method for human reconstruction based on a stepwise image generation scheme, comprising the steps of:
extracting image features of a given human body image, decoding the image features to generate heat maps of dimensions in the x direction, the y direction and the z direction, connecting the heat map distribution of the dimensions in the x direction, the y direction and the z direction, separating the heat map distribution of the dimensions in the x direction, the y direction and the z direction after coding, and performing heat map integration to regress coordinates of a human body joint point;
the heat map distribution of the human body joint points is subjected to information interaction between the human body joint points and vertices of a human body mesh model through a transfomer encoder, sampling on the vertices of the human body mesh model and batch regularization, and then a final human body mesh model is generated step by step;
and inputting the coordinates of the human body joint points into a human body mesh model, and outputting the coordinates of the human body joint points corresponding to the human body model after passing through a universal SMPL model regressor to serve as reconstruction constraints of the human body mesh model.
2. The image-based human body reconstruction method of step-by-step generation formula according to claim 1, wherein the extracting of the image features of the given human body image is specifically:
giving a human body image, cutting the human body image based on a labeling frame or a detection frame, removing background influence, keeping a pure human body image, and adjusting the size of the human body image to enable the human body image to be matched with subsequent neural network processing;
inputting the processed image into a coder for feature extraction to obtain human body image features FP,FPThe characteristic dimension is c multiplied by h multiplied by w;
for the obtained human body image characteristics, firstly utilizing inverse convolution operation in the x-axis and y-axis directions
Figure FDA0003477483580000011
Will feature chart FpAscending the dimension to c' × 8h × 8 w. Then, in the x-axis direction, the y-axis dimension is averaged avgyReuse of one-dimensional convolution
Figure FDA0003477483580000012
Obtaining a heat map distribution P in the x-axis directionH,xE is J multiplied by 8 h; similarly, in the y-axis direction, the averaging operation avg is performed on the x-axis dimensionxReuse of one-dimensional convolution
Figure FDA0003477483580000013
Obtaining a heat map distribution P in the y-axis directionH,y∈J×8w。;
Figure FDA0003477483580000014
Figure FDA0003477483580000015
In the z-axis direction, the averaging operation avg is carried out on the x-axis dimension and the y-axis dimensionx,yBy means of a one-dimensional convolution fpAnd characteristic deforming operation
Figure FDA0003477483580000016
Converting the characteristic dimension into c' x D, and finally, utilizing one-dimensional convolution
Figure FDA0003477483580000017
Obtaining a heat map distribution P in the z-axis directionH,z∈J×D。
Figure FDA0003477483580000018
3. The image-based human body reconstruction method of stepwise generation formula according to claim 2, wherein the heat map distributions of the three dimensions x, y, and z are connected, and the heat map distributions of the three dimensions x, y, and z are separated after being encoded by a transform encoder, specifically:
heat map distribution P for three dimensions of x, y and zH,x,PH,y,PH,z∈RJ×64Fusing in the last dimension to obtain fused feature PH=[PH,x,PH,y,PH,z]∈RJ×192The fused features comprise heat map distribution information of three direction dimensions, and then the fused features are used as the input of a transform encoder;
post-fusion feature PHInputting the heat map information into the N layers of attention modules to carry out heat map information interaction between different direction dimensions and different joint points; wherein, each layer of attention module carries out four operations of multi-head attention, residual connection and regularization, feedforward network processing, residual connection and regularization in turn, and finally outputs a heat map distribution P'H∈RJ×192
Distributing the outputted heatmap to P'HRespectively passing through independent characteristic full connection layers fcx、fcy、fczPerforming feature mapping and matching with the original heat map distribution PH,x、PH,y、PH,zSummed and re-separated into a heat map distribution P 'of three dimensions x, y and z'H,x,P′H,y,P′H,z
P′(H,x)=PH,x+fcx(P′H)
P′(H,y)=PH,y+fcy(P′H)
P′(H,z)=PH,z+fcz(P′H)
For the heat map distribution of three directional dimensions of each joint point, regression is carried out on the coordinate points by using soft-argmax to obtain P'C,x,P′C,y,P′C,z∈RJ×1Then, a connecting operation is performed to obtain final human body joint point coordinates P'C=[P′C,x,P′C,y,P′C,z]∈RJ×3;。
4. An image-based step-by-step human reconstruction method according to claim 3, wherein soft-argmax is defined as follows:
Figure FDA0003477483580000021
5. the image-based step-by-step human body reconstruction method according to claim 1, wherein the human body joint points and the human body mesh model vertex information are interacted, specifically:
given heat map feature FH=[P′H,x,P′H,y,P′H,z]∈RJ×192Obtaining features F by position embeddingembed=FH+ PE, location embedding PE is defined as follows:
Figure FDA0003477483580000022
pos is the position, i is the characteristic dimension subscript, dmodelIs a characteristic dimension;
post position embedding feature FembedInputting into a Transformer encoder, wherein the Transformer encoder comprises N blocks, and each block comprises a multi-head attention module and a feedforward neural network; in each block, FembedFirst passThe multi-head attention module calculates the normalized attention weight, performs characteristic transformation through a feedforward neural network, and finally outputs the attention weight after N blocks
Figure FDA0003477483580000025
Namely the vertex heat map distribution of the human body model with V vertexes.
6. The image-based stepwise generation human body reconstruction method according to claim 1, wherein the human body model vertex upsampling specifically comprises:
outputting processed by a Transformer encoder
Figure FDA0003477483580000023
Input to the 1 × 1 convolutional layer for up-sampling operation, and output as
Figure FDA0003477483580000024
Namely, after 1 × 1 convolution operation, the number of the existing model vertices is doubled, thereby realizing the refinement of the model.
7. The image-based step-by-step human body reconstruction method according to claim 1, wherein the human body joint point coordinates are input into a human body mesh model, and after passing through a general SMPL model regressor, the human body joint point coordinates corresponding to the human body model are output to complete human body reconstruction, specifically:
Figure FDA0003477483580000031
wherein, SMPL.J _ regressor belongs to 24 x 6890, is a joint regression matrix of the SMPL model, M'cE 6890 x 3 is the human body mesh model estimated in the above method,
Figure FDA0003477483580000032
and obtaining the coordinates of the human body joint points by regression.
8. The human body reconstruction system based on the image stepwise generation formula is characterized by being applied to the human body reconstruction method based on the image stepwise generation formula of any one of claims 1 to 7, and comprising a human body posture estimation module, a human body network generation module and a human body joint point regression module;
the human body posture estimation module is used for extracting image features of a given human body image, decoding the image features to generate heat maps of dimensions in the x direction, the y direction and the z direction, connecting the heat map distribution of the dimensions in the x direction, the y direction and the z direction, separating the heat map distribution of the dimensions in the x direction, the y direction and the z direction after coding, and performing heat map integral regression to obtain coordinates of human body joint points;
the heat map distribution of the human body joint points is subjected to information interaction between the human body joint points and vertices of a human body mesh model through a transfomer encoder, sampling on the vertices of the human body mesh model and batch regularization, and then a final human body mesh model is generated step by step;
and the human body joint point regression module is used for inputting the human body joint point coordinates into the human body mesh model, outputting the human body joint point coordinates corresponding to the human body model after passing through the general SMPL model regression device, and using the human body joint point coordinates as reconstruction constraints of the human body mesh model.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores computer program instructions executable by the at least one processor to cause the at least one processor to perform the method of image-based progressive generation human reconstruction as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium storing a program, wherein the program, when executed by a processor, implements the image-based progressive generation human reconstruction method of any one of claims 1 to 7.
CN202210059026.8A 2022-01-19 2022-01-19 Image-based step-by-step generation type human body reconstruction method and device Pending CN114419277A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210059026.8A CN114419277A (en) 2022-01-19 2022-01-19 Image-based step-by-step generation type human body reconstruction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210059026.8A CN114419277A (en) 2022-01-19 2022-01-19 Image-based step-by-step generation type human body reconstruction method and device

Publications (1)

Publication Number Publication Date
CN114419277A true CN114419277A (en) 2022-04-29

Family

ID=81273854

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210059026.8A Pending CN114419277A (en) 2022-01-19 2022-01-19 Image-based step-by-step generation type human body reconstruction method and device

Country Status (1)

Country Link
CN (1) CN114419277A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115147547A (en) * 2022-06-30 2022-10-04 北京百度网讯科技有限公司 Human body reconstruction method and device
CN117726907A (en) * 2024-02-06 2024-03-19 之江实验室 Training method of modeling model, three-dimensional human modeling method and device
WO2024124485A1 (en) * 2022-12-15 2024-06-20 中国科学院深圳先进技术研究院 Three-dimensional human body reconstruction method and apparatus, device, and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115147547A (en) * 2022-06-30 2022-10-04 北京百度网讯科技有限公司 Human body reconstruction method and device
CN115147547B (en) * 2022-06-30 2023-09-19 北京百度网讯科技有限公司 Human body reconstruction method and device
WO2024124485A1 (en) * 2022-12-15 2024-06-20 中国科学院深圳先进技术研究院 Three-dimensional human body reconstruction method and apparatus, device, and storage medium
CN117726907A (en) * 2024-02-06 2024-03-19 之江实验室 Training method of modeling model, three-dimensional human modeling method and device
CN117726907B (en) * 2024-02-06 2024-04-30 之江实验室 Training method of modeling model, three-dimensional human modeling method and device

Similar Documents

Publication Publication Date Title
CN114419277A (en) Image-based step-by-step generation type human body reconstruction method and device
Zeng et al. Aggregated contextual transformations for high-resolution image inpainting
WO2023184759A1 (en) Method and apparatus for completing shape of three-dimensional object, and device and storage medium
CN114049435A (en) Three-dimensional human body reconstruction method and system based on Transformer model
CN111210382B (en) Image processing method, image processing device, computer equipment and storage medium
CN114972746B (en) Medical image segmentation method based on multi-resolution overlapping attention mechanism
US20220148188A1 (en) System and method for automated simulation of teeth transformation
CN116051549A (en) Method, system, medium and equipment for dividing defects of solar cell
Duggal et al. Mending neural implicit modeling for 3d vehicle reconstruction in the wild
Lyu et al. Controllable mesh generation through sparse latent point diffusion models
CN104899835A (en) Super-resolution processing method for image based on blind fuzzy estimation and anchoring space mapping
CN115908805A (en) U-shaped image segmentation network based on convolution enhanced cross self-attention deformer
CN116433697A (en) Abdominal multi-organ CT image segmentation method based on eye movement instrument
Yao et al. Depth super-resolution by texture-depth transformer
CN115249382A (en) Method for detecting silence living body based on Transformer and CNN
CN112950478A (en) Face super-resolution method and system based on dual identity attribute constraint
Wang et al. Super-resolving face image by facial parsing information
Hou et al. Lung nodule segmentation algorithm with SMR-UNet
CN117409280A (en) Multi-scale feature fusion method, system and device for hyperspectral and multispectral images
CN113239977A (en) Training method, device, equipment and storage medium of multi-domain image conversion model
CN117315241A (en) Scene image semantic segmentation method based on transformer structure
CN117292704A (en) Voice-driven gesture action generation method and device based on diffusion model
CN113744284B (en) Brain tumor image region segmentation method and device, neural network and electronic equipment
Li et al. DeepFuse neural networks
Luo et al. GFNet: a gradient information compensation-based face super-resolution network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination