CN114419277A

CN114419277A - Image-based step-by-step generation type human body reconstruction method and device

Info

Publication number: CN114419277A
Application number: CN202210059026.8A
Authority: CN
Inventors: 邝嘉健; 郑伟诗; 高义朋
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2022-01-19
Filing date: 2022-01-19
Publication date: 2022-04-29

Abstract

The invention discloses a human body reconstruction method and a device based on a step-by-step generation formula of an image, wherein the method comprises the following steps: extracting image features of a given human body image, decoding to generate heat maps of dimensions in x, y and z directions, separating again after connection coding, and performing heat map integral regression to obtain coordinates of human body joint points; the heat map distribution of the human body joint points is subjected to information interaction between the human body joint points and vertices of a human body mesh model through a transfomer encoder, sampling on the vertices of the human body mesh model and batch regularization, and then a final human body mesh model is generated step by step; and inputting the vertex coordinates of the human body mesh model into a general SMPL model regression device, and outputting the human body joint point coordinates corresponding to the human body model as reconstruction constraints of the human body mesh model. On the basis of human body three-dimensional posture estimation, the invention introduces an attention mechanism to optimize the distribution of the heat maps of different directions and gradually generates a human body grid model by adopting a thought from coarse to fine.

Description

Image-based step-by-step generation type human body reconstruction method and device

Technical Field

The invention relates to the technical field of image processing, in particular to a step-by-step generation type human body reconstruction method and device based on images.

Background

The existing human body reconstruction technology based on deep learning is mainly divided into two types: the first type is three-dimensional human body reconstruction based on a parameterized model, human body model parameters are estimated through a neural network, and a three-dimensional human body grid model is directly generated by using the parameterized model; the second type of three-dimensional human body reconstruction does not utilize a parameterized model to generate a human body mesh model, but directly returns the coordinates of the vertexes of the human body three-dimensional model based on image characteristic information.

The method for directly regressing the coordinates of the vertexes of the human body three-dimensional model based on the image characteristic information generally comprises two realization methods: firstly, a neural network is utilized to estimate three-dimensional human body posture and heat map distribution of a three-dimensional human body mesh model in x, y and z axes, and then a heat map integral mode is utilized to regress human body joint point coordinates and human body mesh model vertex coordinates. And secondly, gradually returning the three-dimensional grid coordinates of the human body model through a transformer and a gradual dimension reduction thought by means of a human body model template.

In the prior art, the first major disadvantage is that the estimation of each direction dimension heat map is relatively independent, and the relevance between different direction dimension heat map information is lost; the second main disadvantage is that a human body model template is needed, the model calculation amount is large, and the training time is long.

Disclosure of Invention

The invention mainly aims to overcome the defects of the prior art and provide a step-by-step generation type human body reconstruction method and device based on images, an attention mechanism is introduced to learn the distribution relation of heat maps with dimensions in different directions, and three-dimensional human body joint point estimation and human body mesh model reconstruction are optimized through the information interaction of the heat maps with the dimensions in different directions; meanwhile, based on the heat map information of the joint points of the human body, the vertex coordinates of the human body mesh model are gradually regressed from coarse to fine, so that the model does not depend on a human body model template, and the calculated amount of the model and the training time are reduced.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention provides a human body reconstruction method based on a step-by-step generation formula of an image, which comprises the following steps:

extracting image features of a given human body image, decoding the image features to generate heat maps of dimensions in the x direction, the y direction and the z direction, connecting the heat map distribution of the dimensions in the x direction, the y direction and the z direction, separating the heat map distribution of the dimensions in the x direction, the y direction and the z direction after coding, and performing heat map integration to regress coordinates of human body joint points;

the heat map distribution of the human body joint points is subjected to information interaction between the human body joint points and vertices of a human body mesh model through a transfomer encoder, sampling on the vertices of the human body mesh model and batch regularization, and then a final human body mesh model is generated step by step;

and inputting the coordinates of the human body joint points into a human body mesh model, and outputting the coordinates of the human body joint points corresponding to the human body model after passing through a universal SMPL model regressor to serve as reconstruction constraints of the human body mesh model.

As a preferred technical solution, the extracting of the image feature of the given human body image specifically includes:

giving a human body image, cutting the human body image based on a labeling frame or a detection frame, removing background influence, keeping a pure human body image, and adjusting the size of the human body image to enable the human body image to be matched with subsequent neural network processing;

inputting the processed image into a coder for feature extraction to obtain human body image features F_P，F_PThe characteristic dimension is c multiplied by h multiplied by w;

for the obtained human body image characteristics, firstly utilizing inverse convolution operation in the x-axis and y-axis directions

Will feature chart F_pAscending the dimension to c' × 8h × 8 w. Then, in the x-axis direction, the y-axis dimension is averaged avg^yReuse of one-dimensional convolution

Obtaining a heat map distribution P in the x-axis direction^H,xE is J multiplied by 8 h; similarly, in the y-axis direction, the averaging operation avg is performed on the x-axis dimension^xReuse of one-dimensional convolution

Obtaining a heat map distribution P in the y-axis direction^H,y∈J×8w。；

In the z-axis direction, the averaging operation avg is carried out on the x-axis dimension and the y-axis dimension^x,yBy means of a one-dimensional convolution f_pAnd characteristic deforming operation

Converting the characteristic dimension into c' x D, and finally, utilizing one-dimensional convolution

Obtaining a heat map distribution P in the z-axis direction^H,z∈J×D。

As a preferred technical solution, the heat map distributions in the dimensions of x, y, and z are connected, and the heat map distributions in the dimensions of x, y, and z are separated after being encoded by a transform encoder, specifically:

heat map distribution P for three dimensions of x, y and z^H,x,P^H,y,P^H,z∈R^J×64Fusing in the last dimension to obtain fused feature P^H＝[P^H,x,P^H,y,P^H,z]∈R^J×192The fused features comprise heat map distribution information of three direction dimensions, and then the fused features are used as the input of a transform encoder;

post-fusion feature P^HInputting the heat map information into the N layers of attention modules to carry out heat map information interaction between different direction dimensions and different joint points; wherein, each layer of attention module carries out four operations of multi-head attention, residual connection and regularization, feedforward network processing, residual connection and regularization in turn, and finally outputs a heat map distribution P'^H∈R^J×192；

Distributing the outputted heatmap to P'^HRespectively passing through independent characteristic full connection layers fc^x、fc^y、fc^zPerforming feature mapping and matching with the original heat map distribution P^H,x、P^H,y、P^H,zSummed and re-separated into a heat map distribution P 'of three dimensions x, y and z'^H,x,P′^H,y,P′^H,z；

P′^(H,x)＝P^H,x+fc^x(P′^H)

P′^(H,y)＝P^H,y+fc^y(P′^H)

P′^(H,z)＝P^H,z+fc^z(P′^H)

For the heat map distribution of three directional dimensions of each joint point, regression is carried out on the coordinate points by using soft-argmax to obtain P'^C,x,P′^C,y,P′^C,z∈R^J×1Then, a connecting operation is performed to obtain final human body joint point coordinates P'^C＝[P′^C,x,P′^C,y,P′^C,z]∈R^J×3；。

As a preferred technical scheme, the soft-argmax is defined as follows:

as a preferred technical scheme, the human body joint point and human body mesh model vertex information interaction specifically includes:

given heat map feature F_H＝[P′^H,x,P′^H,y,P′^H,z]∈R^J×192Obtaining features F by position embedding_embed＝ F_H+ PE, location embedding PE is defined as follows:

pos is the position, i is the characteristic dimension subscript, d_modelIs a characteristic dimension;

post position embedding feature F_embedInputting into a Transformer encoder, wherein the Transformer encoder comprises N blocks, and each block comprises a multi-head attention module and a feedforward neural network; in each block, F_embedFirstly, calculating normalized attention weight by a multi-head attention module, then carrying out feature transformation by a feedforward neural network, and finally outputting after N blocks

Namely the vertex heat map distribution of the human body model with V vertexes.

As a preferred technical solution, the sampling at the vertex of the human body model specifically includes:

outputting processed by a Transformer encoder

Input to the 1 × 1 convolutional layer for up-sampling operation, and output as

Namely, after 1 × 1 convolution operation, the number of the existing model vertices is doubled, thereby realizing the refinement of the model.

As a preferred technical solution, the inputting of the human body joint point coordinates into the human body mesh model, and after passing through the general SMPL model regressor, outputting the human body joint point coordinates corresponding to the human body model to complete the human body reconstruction specifically are:

wherein, SMPL.J _ regressor belongs to 24 x 6890, is a joint regression matrix of the SMPL model, M'^cE 6890 x 3 is the human body mesh model estimated in the above method,

and obtaining the coordinates of the human body joint points by regression.

The invention provides a human body reconstruction system based on a step-by-step generation formula of an image, which is applied to the human body reconstruction method based on the step-by-step generation formula of the image and comprises a human body posture estimation module, a human body network generation module and a human body correlation node regression module;

the human body posture estimation module is used for extracting image features of a given human body image, decoding the image features to generate heat maps of dimensions in the x direction, the y direction and the z direction, connecting the heat map distribution of the dimensions in the x direction, the y direction and the z direction, separating the heat map distribution of the dimensions in the x direction, the y direction and the z direction after coding, and performing heat map integral regression to obtain coordinates of human body joint points;

the heat map distribution of the human body joint points is subjected to information interaction between the human body joint points and vertices of a human body mesh model through a transfomer encoder, sampling on the vertices of the human body mesh model and batch regularization, and then a final human body mesh model is generated step by step; (ii) a

And the human body joint point regression module is used for inputting the human body joint point coordinates into the human body mesh model, outputting the human body joint point coordinates corresponding to the human body model after passing through the general SMPL model regression device, and using the human body joint point coordinates as reconstruction constraints of the human body mesh model.

Yet another aspect of the present invention provides an electronic device, including:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores computer program instructions executable by the at least one processor to cause the at least one processor to perform the image-based step-wise generation human reconstruction method.

Still another aspect of the present invention provides a computer-readable storage medium storing a program which, when executed by a processor, implements the image-based progressive generation type human body reconstruction method.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the invention introduces an attention mechanism, so that the human posture estimation and the form reconstruction can better utilize the correlation information of the distribution of the heat maps in different direction dimensions, thereby realizing the fine adjustment of the distribution of the heat maps in different direction dimensions. The existing scheme directly utilizes the single-direction dimension heat map to estimate the human body posture and carry out body reconstruction, and interaction of dimension information in different directions is lost.

2. The invention adopts a human body reconstruction scheme of a gradual generation type from coarse to fine, reduces the dependence on a human body model template and reduces the time required by training; the existing method depends on a human body model template, and has large calculated amount and long training time.

3. Compared with a scheme generated directly, the human body reconstruction scheme which is generated step by step from coarse to fine can reduce the video memory requirement required by training.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flowchart of a human body reconstruction method based on a step-by-step generation of images according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of human body posture estimation according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating heat map information interaction between different orientation dimensions and different joints according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of network imaging of a human body according to an embodiment of the invention;

FIG. 5 is a schematic structural diagram of an image-based human body reconstruction system according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the technical solution of the present invention better understood by those skilled in the art, the technical solution of the present invention will be clearly and completely described below with reference to the embodiments of the present invention and the accompanying drawings, it should be understood that the drawings are for illustrative purposes only and are not to be construed as limiting the patent. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

As shown in fig. 1, the human body reconstruction method based on the step-by-step image generation formula of the present embodiment mainly includes the following three steps: (1) estimating the posture of the human body; (2) generating a human body grid; (3) human body joint point regression; on one hand, an attention mechanism is introduced to learn the distribution relation of the heat maps in different directions, and three-dimensional human body joint point estimation and human body mesh model reconstruction are optimized through the information interaction of the heat maps in different directions; on the other hand, based on the heat map information of the human body joint points, the vertex coordinates of the human body mesh model are gradually regressed in a coarse-to-fine mode, so that the model is independent of a human body model template, and the calculated amount of the model and the training time are reduced.

The image-based human body reconstruction method based on the step-by-step generation formula will be detailed in the following according to a specific workflow:

(1) estimating the posture of the human body;

as shown in fig. 2, a specific process of human body posture estimation is that the human body posture estimation module is implemented based on an encoder, a decoder and a transform encoder. A human body image I is given as input, the I carries out feature extraction through an encoder to obtain image features F, the image features F are decoded through a decoder to generate heat map distribution of dimensions in the x direction, the y direction and the z direction, then the heat map distribution of the dimensions in the x direction, the y direction and the z direction is connected and input into an N-layer transform encoder, the heat map distribution of the dimensions in the x direction, the y direction and the z direction is separated after output, and then the coordinates of the joint points of the human body are regressed through heat map integration.

Further, the specific process of the human body posture estimation is as follows:

(1.1) extracting human body image features;

(1.1.1) giving a human body image I, cutting the image based on the labeling frame or the detection frame, removing background influence, reserving a pure human body image, and adjusting the size of the image to be matched with the subsequent neural network processing.

(1.1.2) inputting the processed image into an encoder for feature extraction to obtain human body image features F_P。F_PThe characteristic dimension is x × h × w.

Will feature chart F_pAscending the dimension to c' × 8h × 8 w. Then, in the x-axis direction, the y-axis dimension is subjected to an averaging operation avg^yReuse of one-dimensional convolution

Obtaining a heat map distribution P in the y-axis direction^H,y∈J×8w。

Obtaining a heat map distribution P in the z-axis direction^H,z∈J×D。

In the specific implementation phase 8 h-8 w-D-64 of the present embodiment.

Furthermore, when feature extraction is performed, a residual network ResNet50 can be selected, and ResNet50 has two basic blocks, namely Conv Block and Identity Block, wherein the input and output dimensions of Conv Block are different, so that Conv Block cannot be connected in series, and the function of the residual network ResNet50 is to change the dimensions of the network; the input dimension and the output dimension of the Identity Block are the same and can be connected in series for deepening the network. In the embodiment, the residual error network ResNet50 can be used for well extracting human body characteristics; of course, the feature extraction in the present application is not limited to the residual error network ResNet50, and other residual error networks that can implement the technical solution of the present application are all applicable to the present application and will not be described herein again.

(1.2) generating x, y and z direction dimension heat maps;

(1.2.1) inputting the extracted human body image characteristics F into a decoder, and outputting a multi-dimensional heat map distribution of three dimensions of x, y and z, in the embodiment, outputting a 64-dimensional heat map distribution P of J joint points^H,x,P^H,y,P^H,z∈R^J×64。

(1.3) exchanging heat map information of different direction dimensions;

(1.3.1) Heat map distribution P for three dimensions of x, y, z^H,x,P^H,y,P^H,z∈R^J×64Performing connection (concatee) fusion in the last dimension to obtain a fused feature P^H＝[P^H,x,P^H,y,P^H,z]∈R^J×192The features will contain heat map distribution information of three directional dimensions, and then the fused features are used as input of a transform encoder.

(1.3.2) post-fusion feature P^HInputting the heat map information into an N-layer attention module (such as figure 3) to carry out heat map information interaction in different direction dimensions and between different joint points. Wherein, each layer of attention module carries out four operations of multi-head attention, residual connection and regularization, feedforward network processing, residual connection and regularization and the like in sequence, and finally outputs a heat map distribution P'^H∈R^J×192。

(1.3.3) distribution of outputted heatmap P'^HRespectively passing through independent characteristic full connection layers fc^x、fc^y、fc^zPerforming feature mapping and matching with the original heat map distribution P^H,x、P^H,y、P^H,zAre summed, i.e. can be re-separated into a heat map distribution P 'of three dimensions x, y and z'^H,x,P′^H,y,P′^H,z；

P′^(H,x)＝P^H,x+fc^x(P′^H)

P′^(H,y)＝P^H,y+fc^y(P′^H)

P′^(H,z)＝P^H,z+fc^z(P′^H)

(1.3.4) for the heatmap distribution of three directional dimensions for each joint point, regression was performed on the coordinate points using soft-argmax to obtain P'^C,x,P′^C,y,P′^C,z∈R^J×1Then, a connecting operation is performed to obtain final human body joint point coordinates P'^C＝[P′^C,x,P′^C,y,P′^C,z]∈R^J×3。

Further, in the present embodiment, soft-argmax is defined as follows:

(1.3.5) the human body posture estimation module adopts joint loss and bone loss as training indexes, and the joint loss and the bone loss are respectively defined as follows: l is_joint＝||P′^C-P^C||₁，

Wherein, P^CIs true data, P'^CData is predicted for the network.

(2) Generating a human body mesh model;

as shown in FIG. 4, for intermediate output P 'given in step (1)'^H＝[P′^H,x,P′^H,y,P′^H,z]∈R^J×192，P′^HThe following steps are carried out for a plurality of times: firstly, inputting heat map characteristics into a Transformer encoder for information interaction; inputting the heat map characteristics into a 1 multiplied by 1 convolution network to perform grid vertex up-sampling, and increasing the number of vertices of a grid model; carrying out batch regularization on the heat map characteristics; and gradually generating a final human body mesh model.

The specific process of the human body mesh model is as follows:

here, assume that the heat map feature input for a single layer operation is F^HE.g. B x V x 192, wherein B is the batch processing number, V is the sum of the current joint point and the model vertex number, and 192 is the current feature dimension.

(2.1) interacting joint points and model vertex information;

(2.1.1) given heat map feature F_H＝[P′^H,x,P′^H,y,P′^H,z]∈R^J×192Obtaining features F by position embedding_embed＝F_H+ PE, location embedding PE is defined as follows:

pos is the position, i is the characteristic dimension subscript, d_modelIs the feature dimension (take value 256).

Post position embedding feature F_embedIn the transform encoder, the transform encoder comprises N blocks, each block comprising a multi-headed attention module and a feedforward neural network (see fig. 3). In each block, F_embedFirstly, the normalized attention weight is calculated by a multi-head attention module, then the feature transformation is carried out by a feedforward neural network, and finally, the feature transformation is output after N blocks

(2.2) model vertex upsampling;

(2.2.1) output processed by the Transformer encoder

(2.3) batch regularization;

(2.3.1) vertex heatmap after sampling

Batch regularization is performed.

In conclusion, after the operations of the steps (2.1) to (2.3), the number of the top points of the human body model is doubled, and the construction process of the model from coarse to fine is realized. Finally, similarly to the human body posture estimation module, distributing F 'to the output heat map'^HRe-separation into heatmap distribution F 'of three directional dimensions x, y, z'^H,x,F′^H,y,F′^H,zThe coordinate points were regressed using soft-argmax. Wherein the vertex losses are introduced as optimization objectives. L is_vertex＝||V′^C-V^C||₁,V^CIs true data, V'^CData is predicted for the network.

(3) Human body joint point regression;

with human body mesh model V'^CAs input, after passing through a general SMPL model regressor, the human body joint point coordinates corresponding to the human body model can be output

Joint loss is then introduced to constrain the generation of the mannequin.

The method specifically comprises the following steps:

and obtaining the coordinates of the human body joint points by regression.

Further, in the above-mentioned case,

the human body posture estimation and body reconstruction result of the technical scheme of the invention is more accurate, and the attention mechanism is introduced to process the heat map distribution of different direction dimensions, so that the heat map information of different direction dimensions can be more effectively utilized to guide the human body posture estimation and body reconstruction. And the model training time is shorter, compared with a method depending on a human body model template, the method adopts a scheme of gradually generating from coarse to fine, and has less calculation amount compared with a method of directly adjusting and optimizing based on the human body model template, thereby shortening the model training time.

It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present invention is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present invention.

Based on the same idea as the human body reconstruction method based on the image step-by-step generation formula in the embodiment, the invention also provides a human body reconstruction method system based on the image step-by-step generation formula, which can be used for executing the human body reconstruction method based on the image step-by-step generation formula. For convenience of explanation, the schematic structural diagram of the embodiment of the system for human body reconstruction based on image step-by-step generation only shows the parts related to the embodiment of the present invention, and those skilled in the art will understand that the illustrated structure does not constitute a limitation to the apparatus, and may include more or less components than those illustrated, or combine some components, or arrange different components.

As shown in fig. 6, in another embodiment of the present application, there is provided an image-based step-by-step generation human body reconstruction method system 100, which includes a human body posture estimation module 101, a human body network generation module 102, and a human body joint point regression module 103.

Further, the human body posture estimation module 101 is configured to extract image features of a given human body image, decode the image features to generate heat maps of dimensions in three directions x, y, and z, connect the heat map distributions of the dimensions in the three directions x, y, and z, separate the heat map distributions of the dimensions in the three directions x, y, and z after encoding, and perform heat map integration to obtain coordinates of a human body joint point;

further, the human body network generating module 102 is configured to perform human body joint point and human body mesh model vertex information interaction on human body joint point heat map distribution through a transfomer encoder, sample on human body mesh model vertices, and generate a final human body mesh model step by step after batch regularization;

further, the human joint point regression module 103 is configured to input the coordinates of the human joint points into the human mesh model, and output the coordinates of the human joint points corresponding to the human model after passing through the general SMPL model regressor, as reconstruction constraints of the human mesh model.

The three modules of the system are trained in two stages, wherein the first stage trains human body posture estimation module parameters, an input data set is given, and the human body joint point coordinate P 'is obtained through the human body posture estimation process'^CAnd joint loss and bone loss are used as training indexes and are respectively defined as follows: l is_joint＝||P′^C-P^C||₁，

Wherein, P^CIs true data, P'^CData is predicted for the network. Data flow in this stage does not pass through the human body grid generation module and the human body joint point regression module. Training a human body grid generating module and a human body joint point regression module at the second stage, fixing parameters of a human body posture estimating module at the moment, inputting distribution of heat maps in the x direction, the y direction and the z direction estimated by the human body posture estimating module into the human body grid generating module, and estimating a human body grid model M'^C,M′^CThe joint coordinates after regression are output by a joint regression module

In this stage, vertex loss and joint regression loss are used as training indexes, and are respectively defined as follows: l is_vertex＝ ||M′^C-M^C||₁And

it should be noted that, the image-based human body reconstruction method system of the present invention corresponds to the image-based human body reconstruction method of the present invention, and the technical features and the advantages thereof described in the above embodiment of the image-based human body reconstruction method of the present invention are all applicable to the embodiment of the image-based human body reconstruction method of the step-by-step generation formula.

In addition, in the implementation of the human body reconstruction method system based on image step-by-step generation of the above embodiment, the logical division of the program modules is only an example, and in practical applications, the above function allocation may be performed by different program modules according to needs, for example, due to configuration requirements of corresponding hardware or due to implementation convenience of software, that is, the internal structure of the human body reconstruction method system based on image step-by-step generation is divided into different program modules to perform all or part of the above described functions.

As shown in fig. 6, in an embodiment, an electronic device for implementing a human body reconstruction method based on image stepwise generation is provided, and the electronic device 200 may include a first processor 201, a first memory 202 and a bus, and may further include a computer program, such as a human body reconstruction program 203 based on image stepwise generation, stored in the first memory 202 and executable on the first processor 201.

The first memory 202 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The first memory 202 may in some embodiments be an internal storage unit of the electronic device 200, such as a removable hard disk of the electronic device 200. The first memory 202 may also be an external storage device of the electronic device 200 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 200. Further, the first memory 202 may also include both an internal storage unit and an external storage device of the electronic device 200. The first memory 202 may be used not only to store application software installed in the electronic device 200 and various types of data, such as codes of the human body reconstruction program 203 of the image-based step-by-step generation type, etc., but also to temporarily store data that has been output or is to be output.

The first processor 201 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same function or different functions, and includes one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The first processor 201 is a Control Unit (Control Unit) of the electronic device, connects various components of the whole electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 200 by running or executing programs or modules (e.g., federal learning defense programs, etc.) stored in the first memory 202 and calling data stored in the first memory 202.

Fig. 6 shows only an electronic device having components, and those skilled in the art will appreciate that the structure shown in fig. 6 does not constitute a limitation of the electronic device 200, and may include fewer or more components than those shown, or some components may be combined, or a different arrangement of components.

The human body reconstruction program 203 based on image stepwise generation stored in the first memory 202 of the electronic device 200 is a combination of a plurality of instructions, which when executed in the first processor 201, can realize:

gradually generating a final human body mesh model after the heat map distribution is subjected to information interaction of joint points and model vertexes, human body model vertex sampling and batch regularization;

and inputting the coordinates of the human body joint points into a human body grid model, outputting the coordinates of the human body joint points corresponding to the human body model after passing through a universal SMPL model regression device, and finishing human body reconstruction.

Further, the modules/units integrated with the electronic device 200, if implemented in the form of software functional units and sold or used as separate products, may be stored in a non-volatile computer-readable storage medium. The computer readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program that can be stored in a non-volatile computer-readable storage medium and can include the processes of the embodiments of the methods described above when executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, however, as long as there is no contradiction between the combinations of the technical features, the technical features should be considered as the scope of the present description.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. A method for human reconstruction based on a stepwise image generation scheme, comprising the steps of:

extracting image features of a given human body image, decoding the image features to generate heat maps of dimensions in the x direction, the y direction and the z direction, connecting the heat map distribution of the dimensions in the x direction, the y direction and the z direction, separating the heat map distribution of the dimensions in the x direction, the y direction and the z direction after coding, and performing heat map integration to regress coordinates of a human body joint point;

2. The image-based human body reconstruction method of step-by-step generation formula according to claim 1, wherein the extracting of the image features of the given human body image is specifically:

Obtaining a heat map distribution P in the x-axis direction^H，xE is J multiplied by 8 h; similarly, in the y-axis direction, the averaging operation avg is performed on the x-axis dimension^xReuse of one-dimensional convolution

Obtaining a heat map distribution P in the y-axis direction^H，y∈J×8w。；

In the z-axis direction, the averaging operation avg is carried out on the x-axis dimension and the y-axis dimension^x，yBy means of a one-dimensional convolution f_pAnd characteristic deforming operation

Obtaining a heat map distribution P in the z-axis direction^H，z∈J×D。

3. The image-based human body reconstruction method of stepwise generation formula according to claim 2, wherein the heat map distributions of the three dimensions x, y, and z are connected, and the heat map distributions of the three dimensions x, y, and z are separated after being encoded by a transform encoder, specifically:

heat map distribution P for three dimensions of x, y and z^H，x，P^H，y，P^H，z∈R^J×64Fusing in the last dimension to obtain fused feature P^H＝[P^H，x，P^H，y，P^H，z]∈R^J×192The fused features comprise heat map distribution information of three direction dimensions, and then the fused features are used as the input of a transform encoder;

Distributing the outputted heatmap to P'^HRespectively passing through independent characteristic full connection layers fc^x、fc^y、fc^zPerforming feature mapping and matching with the original heat map distribution P^H，x、P^H，y、P^H，zSummed and re-separated into a heat map distribution P 'of three dimensions x, y and z'^H，x，P′^H，y，P′^H，z；

P′^(H，x)＝P^H，x+fc^x(P′^H)

P′^(H，y)＝P^H，y+fc^y(P′^H)

P′^(H，z)＝P^H，z+fc^z(P′^H)

For the heat map distribution of three directional dimensions of each joint point, regression is carried out on the coordinate points by using soft-argmax to obtain P'^C，x，P′^C，y，P′^C，z∈R^J×1Then, a connecting operation is performed to obtain final human body joint point coordinates P'^C＝[P′^C，x，P′^C，y，P′^C，z]∈R^J×3；。

4. An image-based step-by-step human reconstruction method according to claim 3, wherein soft-argmax is defined as follows:

5. the image-based step-by-step human body reconstruction method according to claim 1, wherein the human body joint points and the human body mesh model vertex information are interacted, specifically:

given heat map feature F_H＝[P′^H，x，P′^H，y，P′^H，z]∈R^J×192Obtaining features F by position embedding_embed＝F_H+ PE, location embedding PE is defined as follows:

post position embedding feature F_embedInputting into a Transformer encoder, wherein the Transformer encoder comprises N blocks, and each block comprises a multi-head attention module and a feedforward neural network; in each block, F_embedFirst passThe multi-head attention module calculates the normalized attention weight, performs characteristic transformation through a feedforward neural network, and finally outputs the attention weight after N blocks

6. The image-based stepwise generation human body reconstruction method according to claim 1, wherein the human body model vertex upsampling specifically comprises:

outputting processed by a Transformer encoder

7. The image-based step-by-step human body reconstruction method according to claim 1, wherein the human body joint point coordinates are input into a human body mesh model, and after passing through a general SMPL model regressor, the human body joint point coordinates corresponding to the human body model are output to complete human body reconstruction, specifically:

and obtaining the coordinates of the human body joint points by regression.

8. The human body reconstruction system based on the image stepwise generation formula is characterized by being applied to the human body reconstruction method based on the image stepwise generation formula of any one of claims 1 to 7, and comprising a human body posture estimation module, a human body network generation module and a human body joint point regression module;

9. An electronic device, characterized in that the electronic device comprises:

at least one processor; and the number of the first and second groups,

the memory stores computer program instructions executable by the at least one processor to cause the at least one processor to perform the method of image-based progressive generation human reconstruction as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium storing a program, wherein the program, when executed by a processor, implements the image-based progressive generation human reconstruction method of any one of claims 1 to 7.