WO2023019478A1 - Three-dimensional reconstruction method and apparatus, electronic device, and readable storage medium - Google Patents

Three-dimensional reconstruction method and apparatus, electronic device, and readable storage medium Download PDF

Info

Publication number
WO2023019478A1
WO2023019478A1 PCT/CN2021/113308 CN2021113308W WO2023019478A1 WO 2023019478 A1 WO2023019478 A1 WO 2023019478A1 CN 2021113308 W CN2021113308 W CN 2021113308W WO 2023019478 A1 WO2023019478 A1 WO 2023019478A1
Authority
WO
WIPO (PCT)
Prior art keywords
reconstructed
functional module
graph
output
human body
Prior art date
Application number
PCT/CN2021/113308
Other languages
French (fr)
Chinese (zh)
Inventor
王磊
刘薰裕
马晓亮
刘宝玉
程俊
Original Assignee
深圳先进技术研究院
中国科学院深圳理工大学(筹)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳先进技术研究院, 中国科学院深圳理工大学(筹) filed Critical 深圳先进技术研究院
Priority to PCT/CN2021/113308 priority Critical patent/WO2023019478A1/en
Publication of WO2023019478A1 publication Critical patent/WO2023019478A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics

Definitions

  • the present application belongs to the technical field of image processing, and in particular relates to a three-dimensional reconstruction method, a three-dimensional reconstruction device, electronic equipment, and a computer-readable storage medium.
  • the 3D reconstruction of human body parts has always been a hot topic in computational vision, and has been widely used in the fields of virtual reality (VR) and augmented reality (AR).
  • VR virtual reality
  • AR augmented reality
  • 3D reconstruction techniques need to rely on more complex and expensive equipment, such as 3D scanners, multi-view cameras or inertial sensors.
  • 3D reconstruction techniques based on a single image have been developed, these 3D reconstruction techniques still have problems such as unstable reconstruction effects.
  • the present application provides a three-dimensional reconstruction method, a three-dimensional reconstruction device, an electronic device and a computer-readable storage medium, which can solve the problem of unstable reconstruction effect existing in the existing three-dimensional reconstruction technology.
  • the present application provides a three-dimensional reconstruction method, including:
  • the present application provides a three-dimensional reconstruction device, including:
  • the extraction module is used to perform feature extraction on the image of the object to be reconstructed to obtain a feature vector, wherein the above-mentioned feature vector is used to represent the shape feature information of the above-mentioned object to be reconstructed;
  • a generation module configured to generate a feature map according to the above-mentioned feature vector and a preset template for the above-mentioned object to be reconstructed, and the above-mentioned preset template is used to represent the three-dimensional structure information of the above-mentioned object to be reconstructed;
  • the reconstruction module is configured to input the above-mentioned feature map into the trained graph convolutional neural network to obtain the three-dimensional reconstruction result of the above-mentioned object to be reconstructed.
  • the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and operable on the processor.
  • the processor executes the computer program, the above-mentioned first aspect is implemented. steps of the method.
  • the present application provides a computer-readable storage medium, wherein the above-mentioned computer-readable storage medium stores a computer program, and when the above-mentioned computer program is executed by one or more processors, the steps of the method in the above-mentioned first aspect are implemented.
  • the present application provides a computer program product, which, when the computer program product is run on an electronic device, causes the electronic device to execute the three-dimensional reconstruction method proposed in the first aspect.
  • the beneficial effect of the present application compared with the prior art is: for the image of the object to be reconstructed, the present application first performs feature extraction on the image to obtain the feature vector used to characterize the shape feature information of the above-mentioned object to be reconstructed, and then The feature vector will be combined with the preset template for the object to be reconstructed to generate a feature map, and finally the feature map will be input into the trained graph convolutional neural network to obtain the 3D reconstruction result of the object to be reconstructed .
  • the finally generated feature map not only contains the shape features of the object to be reconstructed, but also obtains the three-dimensional structure information of the object to be reconstructed displayed by the preset template,
  • the trained graph convolutional neural network can better process the feature map, and the obtained 3D reconstruction results will be more accurate, ensuring the stability of 3D reconstruction.
  • Fig. 1 is a schematic diagram of the implementation flow of the three-dimensional reconstruction method provided by the embodiment of the present application;
  • FIG. 2 is an example diagram of a preset template when the object to be reconstructed is a human body provided by the embodiment of the present application;
  • Fig. 3 is a schematic structural diagram of the first functional module of the graph convolutional neural network provided by the embodiment of the present application;
  • Fig. 4 is a schematic structural diagram of the i-th functional module of the graph convolutional neural network provided by the embodiment of the present application;
  • FIG. 5 is a schematic structural diagram of the Nth functional module of the graph convolutional neural network provided by the embodiment of the present application.
  • Fig. 6 is an example diagram of the overall structure of the graph convolutional neural network provided by the embodiment of the present application.
  • FIG. 7 is an example diagram of the working framework of the three-dimensional reconstruction method provided by the embodiment of the present application.
  • FIG. 8 is a structural block diagram of a three-dimensional reconstruction device provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the embodiment of the present application proposes a 3D reconstruction method, a 3D reconstruction device, an electronic device, and a computer-readable storage medium. After extracting the feature vector of the object to be reconstructed in the image, the feature vector can be combined with The preset template representing the three-dimensional structure information of the object to be reconstructed is combined to obtain a feature map, so that the feature map contains the three-dimensional structure information of the object to be reconstructed displayed by the preset template on the basis of the shape feature of the object to be reconstructed, It can be better processed by the trained graph convolutional neural network, and more accurate 3D reconstruction results can be obtained, ensuring the stability of 3D reconstruction.
  • specific examples will be used below to illustrate.
  • Step 101 performing feature extraction on an image of an object to be reconstructed to obtain a feature vector.
  • the electronic device can take pictures of the object to be reconstructed through its own camera, so that the electronic device can obtain the image of the object to be reconstructed; or, it can also be a third-party device equipped with a camera
  • the object is photographed, and after the image of the object to be reconstructed is obtained, the image is transmitted to the electronic device in a wireless or wired manner, so that the electronic device obtains the image of the object to be reconstructed, and the image of the object to be reconstructed is not acquired here way is limited.
  • the electronic device After the electronic device obtains the image of the object to be reconstructed, it can perform feature extraction on the image to obtain a feature vector.
  • the feature extraction operation here is mainly aimed at the shape feature of the object to be reconstructed information; that is, the obtained feature vector is actually used to characterize the shape feature information of the object to be reconstructed.
  • the shape feature information includes contour features describing the boundary shape of the object to be reconstructed and/or region features describing the internal shape of the object to be reconstructed, etc., which are not limited herein.
  • the electronic device can first perform preprocessing on the image, such as segmentation operations and size adjustment operations; then the step 101 may include:
  • the electronic device may first perform frame detection on the image based on the object to be reconstructed, that is, recognize the frame of the object to be reconstructed from the image.
  • the frame is usually a rectangular frame; of course, it can also be a frame of other preset shapes, which is not limited here.
  • the image area within this bounding box can be regarded as a region of interest. Segmenting the image to be processed based on the border can segment the region of interest from the image, thereby removing a large amount of redundant information and noise information contained in the background of the image to a certain extent, and obtaining the A partial image of the object, and the object to be reconstructed can be located at the center of the partial image as much as possible.
  • the electronic device may use the human body two-dimensional key point detection technology OpenPose to perform border detection on the image.
  • A2 Adjust the size of the partial image to a preset size.
  • the electronic device can unify the size of the partial image: if the size of the partial image is inconsistent with the preset size, the partial image is scaled until the size of the partial image is consistent with the preset size .
  • the preset size may be: 224 ⁇ 224, and the unit is pixel.
  • the number of channels of the partial image is usually 3, which is used to represent three channels of red, green and blue (Red Green Blue, RGB). That is, the final size of the partial image is: 224 ⁇ 224 ⁇ 3.
  • CNN convolutional Neural Networks
  • the electronic device can pre-train the convolutional neural network for classification tasks on a given data set in advance, and the pre-training process can refer to the current general training process for neural networks, which will not be repeated here.
  • the classification layer in the convolutional neural network is removed, and the feature extraction layer before the classification layer is retained, and the retained result constitutes the encoder.
  • the convolutional neural network may be ResNet50
  • the given data set may be the ImageNet data set
  • the output of the encoder of the convolutional neural network is a 2048-dimensional feature vector.
  • Step 102 generate a feature map according to the feature vector and the preset template for the object to be reconstructed.
  • the preset template is used to characterize the three-dimensional structure information of the object to be reconstructed, specifically the three-dimensional structure information of the object to be reconstructed in a specified pose.
  • different preset templates are set.
  • the preset template is a human body mesh
  • the preset template is a hand mesh.
  • the electronic device can use the Preset templates are expressed in a graph structure and combined with feature vectors to generate feature maps.
  • step 101 may include:
  • the human body mesh map adopted by the electronic device for the human body may be a standard template defined by the SMPL (Skinned Multi-Person Linear) model.
  • the human body mesh diagram represents the three-dimensional mesh of the human body under a T-pose (T-Pose).
  • the electronic device in order to reduce the computational complexity of subsequent graph convolution operations, can perform 4 times downsampling on the standard template defined by the SMPL model, and use the downsampling result as a preset template (that is, the human body grid map). It can be understood that the number of vertices of the human body mesh image obtained after 4 times downsampling is reduced to 1723. In the future, it is only necessary to perform 4 times upsampling on the 3D reconstruction results output by the graph convolutional neural network to obtain the final 3D reconstruction results and complete the reconstruction task.
  • M h (V h ,A h ),V h ⁇ R 1723 ⁇ 3 ,A h ⁇ 0,1 ⁇ 1723 ⁇ 1723
  • M h represents the vertex information matrix of the downsampled human body mesh
  • V h represents the vertex set of the downsampled human body mesh
  • V h ⁇ R 1723 ⁇ 3 represents the 1723 vertices contained in the vertex set
  • the coordinate values of the three-dimensional coordinates of are all real numbers
  • a h represents the adjacency matrix of the downsampled human body grid map, the meaning of the adjacency matrix has been explained above, and will not be repeated here
  • a h ⁇ ⁇ 0,1 ⁇ 1723 ⁇ 1723 indicates that the values of the elements in the adjacency matrix are 0 or 1.
  • the 2048-dimensional feature vector obtained in step 101 is f ⁇ R 2048
  • the vertex information matrix of the 4-fold downsampled human body mesh image obtained in step B1 is M h ⁇ R 1723 ⁇ 3
  • the two are fused
  • the feature map can be obtained, which is the input of the subsequent graph convolutional neural network, which can be expressed as F in ⁇ R 1723 ⁇ 2051 .
  • the above operation can be understood as: splicing the 2048-dimensional feature vector to each vertex.
  • the process of generating the feature map is similar to steps B1 and B2, except that the preset template has been changed.
  • the hand grid map adopted by the electronic device for the hand may be a standard template defined by the MANO (hand Model with Articulated and Non-rigid defOrmations) model. It should be noted that since the hand grid image contains a small number of vertices (usually more than 700), there is no need to perform down-sampling operations on the hand grid image.
  • the feature map actually combines the vertex position information of the preset template and the shape feature information of the part to be reconstructed represented in the image.
  • Step 103 input the feature map into the trained graph convolutional neural network to obtain the 3D reconstruction result of the object to be reconstructed.
  • the graph convolutional neural network will use the feature map as input, and finally output the transformed grid vertex position information of the object to be reconstructed as the 3D reconstruction result.
  • the following is an introduction to the overall structure of the graph convolutional neural network:
  • the graph convolutional neural network includes N functional modules in series; wherein, the input of the first functional module is the input of the graph convolutional neural network, and the output of the Nth functional module is the output of the graph convolutional neural network, and N is an integer greater than 2.
  • the first functional module is mainly used to receive the input of the graph convolutional neural network
  • the i-th functional module is mainly used for data calculation and transmission operations
  • the N-th functional module is mainly used to output the final predicted Grid vertex position information of the object to be reconstructed, where i is an integer greater than 1 and less than N.
  • each functional module includes the following three basic units, namely: convolution unit, normalization unit and activation function unit.
  • convolution unit namely: convolution unit, normalization unit and activation function unit.
  • activation function unit namely:
  • Fig. 3 shows the schematic structure of the first functional module.
  • the first functional module includes at least three specified structures (only three are shown in Figure 3), and the at least three specified structures are connected in series, and the specified structures include convolution units, normalization units, and A unit and an activation function unit; in the at least three specified structures: the input of the first specified structure is the input of the first functional module (that is, the input of the graph convolutional neural network); the output of the last specified structure The residual between the input of the first functional module (that is, the input of the graph convolutional neural network) is the output of the first functional module.
  • FIG. 4 shows a structural diagram of the i-th functional module.
  • the i-th functional module includes at least two specified structures (only 2 are shown in Figure 4), and the at least two specified structures are connected in sequence, and the specified structure is the same as the specified structure of the first functional module Same, no more details here.
  • the input of the first specified structure is the output of the i-1th functional module
  • the residual between the output of the last specified structure and the output of the i-1th functional module is the The output of the i-th functional module.
  • FIG. 5 shows a schematic structural diagram of the Nth functional module.
  • the Nth functional module includes a convolution unit, a normalization unit, an activation function unit, and a convolutional unit connected in series; where the input of the first convolutional unit is the N-1th functional module. output, the residual between the output of the second convolution unit and the output of the N-1th functional module is the output of the Nth functional module. It can be understood that since the function of the Nth functional module is to output the last predicted grid vertex position information of the object to be reconstructed, the data output by the second convolution unit of the Nth functional module is not Data normalization processing and activation function processing are then required.
  • Figure 6 shows an example of the overall structure of a graph convolutional neural network including 4 functional modules.
  • the parameter f in and the parameter f out can be used to represent the change in the size of the feature dimension of each functional module during the graph convolution operation.
  • the first functional module includes 3 convolution units
  • the change of its feature dimension size can be expressed as (f in , f out , f out2 , f out3 ), where f in is the feature dimension size of the initial input, f out1 , f out2 and f out3 are the feature dimension sizes output by the three convolution units respectively, since the normalization unit and the activation function unit do not change the feature dimension size, so input the second
  • the size of the feature dimension of the convolution unit is the same as the size of the feature dimension output by the first convolution unit, and so on.
  • the obtained feature map F in ⁇ R 1723 ⁇ 2051 when the object to be reconstructed is a human body, the obtained feature map F in ⁇ R 1723 ⁇ 2051 .
  • the feature map passes through the graph convolutional neural network, and the final output F out ⁇ R 1723 ⁇ 3 of the graph convolutional neural network can be obtained.
  • M out ⁇ R 6890 ⁇ 3 is obtained, that is, the final 3D reconstruction result for the human body is obtained.
  • the convolution unit may be a Chebyshev convolution unit, and the Chebyshev convolution unit specifically uses Chebyshev polynomials to construct a Chebyshev convolution algorithm.
  • the processing speed of the graph convolutional neural network can be accelerated to a certain extent, and the efficiency of 3D reconstruction can be improved.
  • F in ⁇ R N ⁇ fin represents the input features.
  • F out ⁇ R N ⁇ fout represents the output feature.
  • K represents the use of K-order Chebyshev polynomials.
  • ⁇ k ⁇ R fin ⁇ fout represents the feature change matrix, and the parameters inside are the values that the graph convolutional neural network needs to learn.
  • a scaled Laplacian matrix representing the preset template When the object to be reconstructed is a human body and the preset template used is the downsampling result of the standard template defined by the SMPL model, N is the number of vertices after downsampling, 1723.
  • the scaled Laplacian matrix is specifically:
  • I is an identity matrix
  • ⁇ max is the largest eigenvalue of the L p matrix.
  • L is an intermediate parameter, which has no actual physical meaning.
  • W is the parameter that the graph convolutional neural network needs to learn.
  • the graph convolutional neural network can be trained using two datasets, Human3.6M and MSCOCO. Specifically, since these two data sets do not store the real human body mesh in each training sample, but only store the position information of the real human body's 3D joints, therefore, it is necessary to pre-based on the position information of the real human body's 3D joints of each training sample
  • the high-precision real human body mesh is obtained by fitting, and the real human body mesh can be used as a strong label and put into the training process of the convolutional neural network of the image for use. That is to say, the real human body mesh mentioned here is actually a high-precision result fitted based on the three-dimensional joints of the real human body.
  • this graph convolutional neural network is basically the same as that of a general neural network, except that a new loss function is used to make the 3D reconstruction results output by the trained graph convolutional neural network model smoother. Complete, its practicality is also higher.
  • the loss function is:
  • ⁇ a , ⁇ b , ⁇ c and ⁇ d are hyperparameters.
  • Lv denotes the mesh loss, which is used to describe the positional difference between the real body mesh and the predicted body mesh.
  • M * represent the position of each vertex of the real human body mesh
  • M represent the position of each vertex of the predicted human body mesh
  • L1 loss the mesh loss L v is expressed as follows:
  • L j represents the 3D joint loss, which is used to describe the position difference between the real human 3D joints and the predicted human 3D joints.
  • J 3D* represent the position of the real three-dimensional joints of the human body
  • JM represents the predicted position of the human body joints
  • J ⁇ R v ⁇ N represents the matrix of joints extracted from the human body mesh
  • M represents the position of each vertex of the predicted human body mesh
  • L n represents the surface normal loss, which is used to describe the angle difference between the normal vectors of the triangular faces of the real human mesh and the normal vectors of the triangular faces of the predicted human mesh.
  • f represent the triangular surface of the predicted human body mesh
  • n f * represents the unit normal vector of the triangular surface corresponding to f in the real human body mesh
  • m i and m j represent the coordinates of two vertices in f respectively
  • L e represents the surface edge loss, which is used to describe the length difference between the edge lengths of the triangular faces of the real human mesh and the edge lengths of the triangular faces of the predicted human mesh.
  • f represent the triangular surface of the predicted human body mesh
  • m i and m j represent the two vertex coordinates in f respectively
  • m i * and m j * represent the vertex coordinates corresponding to m i and m j in the real human body mesh respectively
  • the surface edge loss L e is expressed as follows:
  • FIG. 7 shows a working frame example of the three-dimensional reconstruction method in the embodiment of the present application by taking the human body as an example to be reconstructed.
  • the working framework consists of two parts, a convolutional neural network-based encoder and a graph convolutional neural network-based human 3D vertex regressor.
  • the original image of the human body is obtained.
  • the original image is used as the initial input, and the partial image is obtained after preprocessing.
  • the partial image is encoded into a set of feature vectors through an encoder based on a convolutional neural network.
  • the set of feature vectors are fused and spliced with the grid vertex position information in the preset human grid graph to form a feature graph as the input of the graph convolutional neural network, and finally the graph convolutional neural network will return a new set of network
  • the position information of the vertices of the lattice makes it conform to the two-dimensional observation of the human body in the original image, and completes the task of three-dimensional reconstruction of the human body.
  • feature extraction is first performed on the image of the object to be reconstructed, and the feature vector used to characterize the shape feature information of the object to be reconstructed is obtained, and then the feature vector is combined with the preset template for the object to be reconstructed Combined to generate a feature map, and finally input the feature map into the trained graph convolutional neural network, so as to obtain the 3D reconstruction result of the object to be reconstructed.
  • the finally generated feature map not only contains the shape features of the object to be reconstructed, but also obtains the three-dimensional structure information of the object to be reconstructed displayed by the preset template,
  • the trained graph convolutional neural network can better process the feature map, and the obtained 3D reconstruction results will be more accurate, ensuring the stability of 3D reconstruction.
  • an embodiment of the present application further provides a three-dimensional reconstruction device.
  • the three-dimensional reconstruction device 800 includes:
  • the extraction module 801 is configured to perform feature extraction on the image of the object to be reconstructed to obtain a feature vector, wherein the feature vector is used to represent the shape feature information of the object to be reconstructed;
  • the generation module 802 is configured to generate a feature map according to the above-mentioned feature vector and a preset template for the above-mentioned object to be reconstructed, and the above-mentioned preset template is used to represent the three-dimensional structure information of the above-mentioned object to be reconstructed;
  • the reconstruction module 803 is configured to input the above-mentioned feature map into the trained graph convolutional neural network to obtain the three-dimensional reconstruction result of the above-mentioned object to be reconstructed.
  • the graph convolutional neural network includes N functional modules connected in series;
  • the input of the first above-mentioned functional module is the input of the above-mentioned graph convolutional neural network
  • the output of the Nth above-mentioned functional module is the output of the above-mentioned graph convolutional neural network
  • the above-mentioned N is an integer greater than 2;
  • the above functional modules include a convolution unit, a normalization unit and an activation function unit.
  • the first above-mentioned functional module includes at least three specified structures, and the above-mentioned at least three specified structures are connected in sequence, and the above-mentioned specified structure includes the above-mentioned convolution unit, the above-mentioned normalization unit and the above-mentioned activation function unit connected in sequence;
  • the input of the first specified structure is the input of the first above-mentioned function module
  • the residual between the output of the last specified structure and the input of the first above-mentioned function module is the first The output of the above function modules.
  • the i-th above-mentioned functional module includes at least two specified structures, and the above-mentioned at least two specified structures are connected in series, and the above-mentioned specified structure includes the above-mentioned convolution unit, the above-mentioned normalization unit and the above-mentioned activation function unit connected in series, and the above-mentioned i is an integer greater than 1 and less than N;
  • the input of the first specified structure is the output of the i-1th above-mentioned functional module
  • the residual between the output of the last specified structure and the output of the i-1th above-mentioned functional module is the output of the i-th above-mentioned functional module.
  • the Nth above-mentioned functional module includes the above-mentioned convolution unit, the above-mentioned normalization unit, the above-mentioned activation function unit, and the above-mentioned convolution unit connected in series;
  • the input of the first above-mentioned convolution unit is the output of the N-1th above-mentioned functional module
  • the output of the second above-mentioned convolutional unit is the output of the N-1th above-mentioned functional module
  • the residual between is the output of the Nth above-mentioned functional module.
  • the above convolution unit is a Chebyshev convolution unit.
  • the above-mentioned object to be reconstructed is a human body
  • the above-mentioned preset template is a human body grid map
  • the above-mentioned generating module 802 includes:
  • a construction unit configured to construct a graph structure in a preset format based on the above-mentioned human body mesh graph, where the graph structure includes the vertex information of the above-mentioned human body mesh graph;
  • the splicing unit is configured to fuse and splice the above-mentioned feature vectors and the above-mentioned graph structure to obtain the above-mentioned feature map.
  • the above three-dimensional reconstruction device 800 also includes:
  • the above mesh loss is used to describe the position difference between the real human body mesh and the predicted human body mesh;
  • the above three-dimensional joint loss is used to describe the position difference between the real human three-dimensional joints and the predicted human three-dimensional joints;
  • the surface normal loss described above is used to describe the angular difference between the normal vectors of the triangular faces of the real human mesh and the normal vectors of the triangular faces of the predicted human mesh;
  • the surface edge loss described above is used to describe the difference in length between the edge lengths of the triangular faces of the real human mesh and the edge lengths of the triangular faces of the predicted human mesh.
  • the above extraction module 801 includes:
  • a segmentation unit configured to segment the image based on the object to be reconstructed to obtain a partial image
  • an adjustment unit configured to adjust the size of the partial image to a preset size
  • the extraction unit is configured to perform feature extraction on the above-mentioned partial image after size adjustment by using an encoder using a convolutional neural network to obtain the above-mentioned feature vector.
  • feature extraction is first performed on the image of the object to be reconstructed to obtain the feature vector used to characterize the shape feature information of the object to be reconstructed, and then the feature vector is combined with the preset object for the object to be reconstructed
  • the templates are combined to generate a feature map, and finally the feature map is input into the trained graph convolutional neural network to obtain the 3D reconstruction result of the object to be reconstructed.
  • the finally generated feature map not only contains the shape features of the object to be reconstructed, but also obtains the three-dimensional structure information of the object to be reconstructed displayed by the preset template,
  • the trained graph convolutional neural network can better process the feature map, and the obtained 3D reconstruction results will be more accurate, ensuring the stability of 3D reconstruction.
  • an embodiment of the present application further provides an electronic device.
  • the electronic device 9 in the embodiment of the present application includes: a memory 901, one or more processors 902 (only one is shown in Fig. 9 ) and a computer stored on the memory 901 and operable on the processor program.
  • the memory 901 is used to store software programs and units
  • the processor 902 executes various functional applications and diagnoses by running the software programs and units stored in the memory 901 to obtain resources corresponding to the above preset events.
  • the processor 902 implements the following steps by running the above-mentioned computer program stored in the memory 901:
  • the above-mentioned graph convolutional neural network includes N series-connected functional modules
  • the input of the first above-mentioned functional module is the input of the above-mentioned graph convolutional neural network
  • the output of the Nth above-mentioned functional module is the output of the above-mentioned graph convolutional neural network
  • the above-mentioned N is an integer greater than 2;
  • the above functional modules include a convolution unit, a normalization unit and an activation function unit.
  • the first above-mentioned functional module includes at least three specified structures, and the at least three specified structures are connected in sequence, and the above-mentioned specified structures include sequentially The above-mentioned convolution unit, the above-mentioned normalization unit and the above-mentioned activation function unit connected in series;
  • the input of the first specified structure is the input of the first above-mentioned function module
  • the residual between the output of the last specified structure and the input of the first above-mentioned function module is the first The output of the above function modules.
  • the i-th functional module includes at least two specified structures, the at least two specified structures are connected in sequence, and the specified structures include sequentially The above-mentioned convolution unit, the above-mentioned normalization unit and the above-mentioned activation function unit connected in series, the above-mentioned i is an integer greater than 1 and less than N;
  • the input of the first specified structure is the output of the i-1th above-mentioned functional module
  • the residual between the output of the last specified structure and the output of the i-1th above-mentioned functional module is the output of the i-th above-mentioned functional module.
  • the Nth functional module includes the above-mentioned convolution unit, the above-mentioned normalization unit, the above-mentioned activation function unit and the above-mentioned convolution unit;
  • the input of the first above-mentioned convolution unit is the output of the N-1th above-mentioned functional module
  • the output of the second above-mentioned convolutional unit is the output of the N-1th above-mentioned functional module
  • the residual between is the output of the Nth above-mentioned functional module.
  • the above convolution unit is a Chebyshev convolution unit.
  • the preset template is a human body mesh;
  • the preset template of the object to be reconstructed generates a feature map, including:
  • the above feature vector is fused and spliced with the above graph structure to obtain the above feature map.
  • the processor 902 further implements the following steps when running the above computer program stored in the memory 901:
  • the total loss of the above-mentioned graph convolutional neural network is calculated based on grid loss, three-dimensional joint loss, surface normal loss and surface edge loss;
  • the above mesh loss is used to describe the position difference between the real human body mesh and the predicted human body mesh;
  • the above three-dimensional joint loss is used to describe the position difference between the real human three-dimensional joints and the predicted human three-dimensional joints;
  • the surface normal loss described above is used to describe the angular difference between the normal vectors of the triangular faces of the real human mesh and the normal vectors of the triangular faces of the predicted human mesh;
  • the surface edge loss described above is used to describe the difference in length between the edge lengths of the triangular faces of the real human mesh and the edge lengths of the triangular faces of the predicted human mesh.
  • feature extraction is performed on the image of the object to be reconstructed to obtain a feature vector, including:
  • the above-mentioned feature vector is obtained by performing feature extraction on the above-mentioned partial image after size adjustment by using an encoder of a convolutional neural network.
  • the so-called processor 902 may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP) , Application Specific Integrated Circuit (ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the memory 901 may include read-only memory and random-access memory, and provides instructions and data to the processor 902 . Part or all of the memory 901 may also include non-volatile random access memory. For example, the memory 901 may also store information of device categories.
  • feature extraction is first performed on the image of the object to be reconstructed to obtain the feature vector used to characterize the shape feature information of the object to be reconstructed, and then the feature vector will be combined with the predicted object for the object to be reconstructed.
  • the templates are combined to generate a feature map, and finally the feature map is input into the trained graph convolutional neural network to obtain the 3D reconstruction result of the object to be reconstructed.
  • the finally generated feature map not only contains the shape features of the object to be reconstructed, but also obtains the three-dimensional structure information of the object to be reconstructed displayed by the preset template,
  • the trained graph convolutional neural network can better process the feature map, and the obtained 3D reconstruction results will be more accurate, ensuring the stability of 3D reconstruction.
  • the disclosed devices and methods can be implemented in other ways.
  • the system embodiments described above are only illustrative.
  • the division of the above-mentioned modules or units is only a logical function division.
  • multiple units or components can be combined Or it can be integrated into another system, or some features can be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • the above integrated units are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the present application realizes all or part of the processes in the methods of the above-mentioned embodiments, and can also be completed by instructing associated hardware through computer programs.
  • the above-mentioned computer programs can be stored in a computer-readable storage medium, and the computer When the program is executed by the processor, the steps in the above-mentioned various method embodiments can be realized.
  • the above-mentioned computer program includes computer program code, and the above-mentioned computer program code may be in the form of source code, object code, executable file or some intermediate form.
  • the above-mentioned computer-readable storage medium may include: any entity or device capable of carrying the above-mentioned computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer-readable memory, read-only memory (ROM, Read-Only Memory ), Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunication signal, and software distribution medium, etc.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • electrical carrier signal telecommunication signal
  • software distribution medium etc.
  • the content contained in the above-mentioned computer-readable storage media can be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction.
  • computer-readable storage media The medium does not include electrical carrier signals and telecommunication signals.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Computer Hardware Design (AREA)
  • Computer Graphics (AREA)
  • Image Analysis (AREA)

Abstract

The present application relates to the technical field of image processing, and in particular, discloses a three-dimensional reconstruction method, a three-dimensional reconstruction apparatus, an electronic device, and a computer readable storage medium. The three-dimensional reconstruction method comprises: performing feature extraction on an image of an object to be reconstructed to obtain a feature vector, wherein the feature vector is used for representing shape feature information of said object; generating a feature map according to the feature vector and a preset template for said object, the preset template being used for representing three-dimensional structure information of said object; and inputting the feature map into a trained graph convolutional neural network to obtain a three-dimensional reconstruction result of said object. By means of the solution of the present application, the stability of three-dimensional reconstruction can be improved.

Description

一种三维重建方法、装置、电子设备及可读存储介质A three-dimensional reconstruction method, device, electronic equipment and readable storage medium 技术领域technical field
本申请属于图像处理技术领域,尤其涉及一种三维重建方法、三维重建装置、电子设备及计算机可读存储介质。The present application belongs to the technical field of image processing, and in particular relates to a three-dimensional reconstruction method, a three-dimensional reconstruction device, electronic equipment, and a computer-readable storage medium.
背景技术Background technique
针对人的身体部位的三维重建一直是计算视觉中的一个热点问题,在虚拟现实(Virtual Reality,VR)和增强现实(Augmented Reality,AR)领域均有着广泛的应用。The 3D reconstruction of human body parts has always been a hot topic in computational vision, and has been widely used in the fields of virtual reality (VR) and augmented reality (AR).
技术问题technical problem
传统的三维重建技术需要依赖较为复杂及昂贵的设备,如三维扫描仪、多视角相机或惯性传感器等。当前,虽已发展出了基于单张图像的三维重建技术,但这些三维重建技术也仍存在着重建效果不稳定等问题。Traditional 3D reconstruction techniques need to rely on more complex and expensive equipment, such as 3D scanners, multi-view cameras or inertial sensors. At present, although 3D reconstruction techniques based on a single image have been developed, these 3D reconstruction techniques still have problems such as unstable reconstruction effects.
技术解决方案technical solution
本申请提供了一种三维重建方法、三维重建装置、电子设备及计算机可读存储介质,可以解决现有的三维重建技术所存在的重建效果不稳定的问题。The present application provides a three-dimensional reconstruction method, a three-dimensional reconstruction device, an electronic device and a computer-readable storage medium, which can solve the problem of unstable reconstruction effect existing in the existing three-dimensional reconstruction technology.
第一方面,本申请提供了一种三维重建方法,包括:In a first aspect, the present application provides a three-dimensional reconstruction method, including:
对待重建对象的图像进行特征提取,得到特征向量,其中,上述特征向量用于表征上述待重建对象的形状特征信息;performing feature extraction on the image of the object to be reconstructed to obtain a feature vector, wherein the feature vector is used to represent the shape feature information of the object to be reconstructed;
根据上述特征向量及针对上述待重建对象的预设模板,生成特征图,上述预设模板用于表征上述待重建对象的三维结构信息;Generate a feature map according to the feature vector and a preset template for the object to be reconstructed, where the preset template is used to represent the three-dimensional structure information of the object to be reconstructed;
将上述特征图输入至已训练的图卷积神经网络(Graph Convolutional Network,GCN)中,得到上述待重建对象的三维重建结果。Input the above feature map into the trained graph convolutional neural network (Graph Convolutional Network, GCN) to obtain the 3D reconstruction result of the above object to be reconstructed.
第二方面,本申请提供了一种三维重建装置,包括:In a second aspect, the present application provides a three-dimensional reconstruction device, including:
提取模块,用于对待重建对象的图像进行特征提取,得到特征向量,其中,上述特征向量用于表征上述待重建对象的形状特征信息;The extraction module is used to perform feature extraction on the image of the object to be reconstructed to obtain a feature vector, wherein the above-mentioned feature vector is used to represent the shape feature information of the above-mentioned object to be reconstructed;
生成模块,用于根据上述特征向量及针对上述待重建对象的预设模板,生成特征图,上述预设模板用于表征上述待重建对象的三维结构信息;A generation module, configured to generate a feature map according to the above-mentioned feature vector and a preset template for the above-mentioned object to be reconstructed, and the above-mentioned preset template is used to represent the three-dimensional structure information of the above-mentioned object to be reconstructed;
重建模块,用于将上述特征图输入至已训练的图卷积神经网络中,得到上述待重建对象的三维重建结果。The reconstruction module is configured to input the above-mentioned feature map into the trained graph convolutional neural network to obtain the three-dimensional reconstruction result of the above-mentioned object to be reconstructed.
第三方面,本申请提供了一种电子设备,包括存储器、处理器以及存储在上述存储器中并可在上述处理器上运行的计算机程序,上述处理器执行上述计算机程序时实现如上述第一方面的方法的步骤。In a third aspect, the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and operable on the processor. When the processor executes the computer program, the above-mentioned first aspect is implemented. steps of the method.
第四方面,本申请提供了一种计算机可读存储介质,上述计算机可读存储介质存储有 计算机程序,上述计算机程序被一个或多个处理器执行时实现如上述第一方面的方法的步骤。In a fourth aspect, the present application provides a computer-readable storage medium, wherein the above-mentioned computer-readable storage medium stores a computer program, and when the above-mentioned computer program is executed by one or more processors, the steps of the method in the above-mentioned first aspect are implemented.
第五方面,本申请提供了一种计算机程序产品,当计算机程序产品在电子设备上运行时,使得电子设备执行第一方面所提出的三维重建方法。In a fifth aspect, the present application provides a computer program product, which, when the computer program product is run on an electronic device, causes the electronic device to execute the three-dimensional reconstruction method proposed in the first aspect.
有益效果Beneficial effect
本申请与现有技术相比存在的有益效果是:对于待重建对象的图像来说,本申请首先对该图像进行特征提取,得到用于表征上述待重建对象的形状特征信息的特征向量,然后会将该特征向量与针对该待重建对象的预设模板相结合来生成特征图,最后将该特征图输入至已训练的图卷积神经网络中,以此得到该待重建对象的三维重建结果。上述过程中,通过特征向量与预设模板的结合,使得最终所生成的特征图在包含有待重建对象的形状特征的基础上,还获得了预设模板所展示的待重建对象的三维结构信息,由此一来,已训练的图卷积神经网络可更好地处理该特征图,所获得的三维重建结果也会更为准确,保障了三维重建的稳定性。The beneficial effect of the present application compared with the prior art is: for the image of the object to be reconstructed, the present application first performs feature extraction on the image to obtain the feature vector used to characterize the shape feature information of the above-mentioned object to be reconstructed, and then The feature vector will be combined with the preset template for the object to be reconstructed to generate a feature map, and finally the feature map will be input into the trained graph convolutional neural network to obtain the 3D reconstruction result of the object to be reconstructed . In the above process, through the combination of the feature vector and the preset template, the finally generated feature map not only contains the shape features of the object to be reconstructed, but also obtains the three-dimensional structure information of the object to be reconstructed displayed by the preset template, As a result, the trained graph convolutional neural network can better process the feature map, and the obtained 3D reconstruction results will be more accurate, ensuring the stability of 3D reconstruction.
附图说明Description of drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the accompanying drawings that need to be used in the descriptions of the embodiments or the prior art will be briefly introduced below. Obviously, the accompanying drawings in the following description are only for the present application For some embodiments, those of ordinary skill in the art can also obtain other drawings based on these drawings without any creative effort.
图1是本申请实施例提供的三维重建方法的实现流程示意图;Fig. 1 is a schematic diagram of the implementation flow of the three-dimensional reconstruction method provided by the embodiment of the present application;
图2是本申请实施例提供的待重建对象为人体时,预设模板的示例图;FIG. 2 is an example diagram of a preset template when the object to be reconstructed is a human body provided by the embodiment of the present application;
图3是本申请实施例提供的图卷积神经网络的第1个功能模块的结构示意图;Fig. 3 is a schematic structural diagram of the first functional module of the graph convolutional neural network provided by the embodiment of the present application;
图4是本申请实施例提供的图卷积神经网络的第i个功能模块的结构示意图;Fig. 4 is a schematic structural diagram of the i-th functional module of the graph convolutional neural network provided by the embodiment of the present application;
图5是本申请实施例提供的图卷积神经网络的第N个功能模块的结构示意图;5 is a schematic structural diagram of the Nth functional module of the graph convolutional neural network provided by the embodiment of the present application;
图6是本申请实施例提供的图卷积神经网络的整体结构示例图;Fig. 6 is an example diagram of the overall structure of the graph convolutional neural network provided by the embodiment of the present application;
图7是本申请实施例提供的三维重建方法的工作框架示例图;FIG. 7 is an example diagram of the working framework of the three-dimensional reconstruction method provided by the embodiment of the present application;
图8是本申请实施例提供的三维重建装置的结构框图;FIG. 8 is a structural block diagram of a three-dimensional reconstruction device provided by an embodiment of the present application;
图9是本申请实施例提供的电子设备的结构示意图。FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
本发明的实施方式Embodiments of the present invention
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。In the following description, specific details such as specific system structures and technologies are presented for the purpose of illustration rather than limitation, so as to thoroughly understand the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
目前,现有的三维重建技术仍存在着重建效果不稳定的问题。为解决该问题,本申请实施例提出了一种三维重建方法、三维重建装置、电子设备及计算机可读存储介质,可在提取出了图像中待重建对象的特征向量之后,将该特征向量与表征有待重建对象的三维结构信息的预设模板结合得到特征图,使得该特征图在包含有待重建对象的形状特征的基础上,还包含有预设模板所展示的待重建对象的三维结构信息,能够更好地被已训练的图卷积神经网络处理,可获得更加准确的三维重建结果,保障了三维重建的稳定性。为了说明本申请所提出的技术方案,下面通过具体实施例来进行说明。At present, the existing 3D reconstruction technology still has the problem of unstable reconstruction effect. In order to solve this problem, the embodiment of the present application proposes a 3D reconstruction method, a 3D reconstruction device, an electronic device, and a computer-readable storage medium. After extracting the feature vector of the object to be reconstructed in the image, the feature vector can be combined with The preset template representing the three-dimensional structure information of the object to be reconstructed is combined to obtain a feature map, so that the feature map contains the three-dimensional structure information of the object to be reconstructed displayed by the preset template on the basis of the shape feature of the object to be reconstructed, It can be better processed by the trained graph convolutional neural network, and more accurate 3D reconstruction results can be obtained, ensuring the stability of 3D reconstruction. In order to illustrate the technical solution proposed by the present application, specific examples will be used below to illustrate.
下面对本申请实施例所提出的三维重建方法作出说明。请参阅图1,该三维重建方法的实现流程详述如下:The three-dimensional reconstruction method proposed in the embodiment of the present application will be described below. Please refer to Figure 1, the implementation process of the 3D reconstruction method is described in detail as follows:
步骤101,对待重建对象的图像进行特征提取,得到特征向量。 Step 101, performing feature extraction on an image of an object to be reconstructed to obtain a feature vector.
在本申请实施例中,电子设备可通过自身所搭载的摄像头对待重建对象进行拍摄,以使得电子设备获得待重建对象的图像;或者,也可以是由搭载有摄像头的第三方设备对该待重建对象进行拍摄,获得待重建对象的图像后,再通过无线或有线的方式将该图像传输至电子设备处,以使得电子设备获得待重建对象的图像,此处不对该待重建对象的图像的获取方式作出限定。In the embodiment of the present application, the electronic device can take pictures of the object to be reconstructed through its own camera, so that the electronic device can obtain the image of the object to be reconstructed; or, it can also be a third-party device equipped with a camera The object is photographed, and after the image of the object to be reconstructed is obtained, the image is transmitted to the electronic device in a wireless or wired manner, so that the electronic device obtains the image of the object to be reconstructed, and the image of the object to be reconstructed is not acquired here way is limited.
电子设备在获得待重建对象的图像后,即可对该图像进行特征提取,以得到特征向量。考虑到三维重建操作所得的三维重建结果主要是为了还原待重建对象的姿态,而姿态与待重建对象的形状特征最为相关,因而,此处特征提取的操作主要针对的是待重建对象的形状特征信息;也即,所得的特征向量实际用于表征该待重建对象的形状特征信息。可以理解,该形状特征信息包括描述待重建对象的边界形状的轮廓特征和/或描述待重建对象的内部形状的区域特征等,此处不作限定。After the electronic device obtains the image of the object to be reconstructed, it can perform feature extraction on the image to obtain a feature vector. Considering that the 3D reconstruction result obtained by the 3D reconstruction operation is mainly to restore the pose of the object to be reconstructed, and the pose is most related to the shape feature of the object to be reconstructed, therefore, the feature extraction operation here is mainly aimed at the shape feature of the object to be reconstructed information; that is, the obtained feature vector is actually used to characterize the shape feature information of the object to be reconstructed. It can be understood that the shape feature information includes contour features describing the boundary shape of the object to be reconstructed and/or region features describing the internal shape of the object to be reconstructed, etc., which are not limited herein.
在一些实施例中,待重建对象的图像中除了待重建对象的图像信息外,还可能存在一些冗余信息及噪声信息。为避免冗余信息及噪声信息影响到后续特征提取的准确性,电子设备可先对该图像进行预处理,例如分割操作及尺寸调整操作等;则该步骤101可以包括:In some embodiments, in addition to the image information of the object to be reconstructed, there may be some redundant information and noise information in the image of the object to be reconstructed. In order to prevent redundant information and noise information from affecting the accuracy of subsequent feature extraction, the electronic device can first perform preprocessing on the image, such as segmentation operations and size adjustment operations; then the step 101 may include:
A1、基于待重建对象对图像进行分割,得到局部图像。A1. Segment the image based on the object to be reconstructed to obtain a partial image.
电子设备可先基于待重建对象对该图像进行边框检测,也即从该图像中识别出待重建对象的边框。该边框通常为矩形框;当然,也可以是其它预设形状的边框,此处不作限定。该边框内的图像区域可被视作感兴趣区域。基于该边框对待处理图像进行分割,即可将该感兴趣区域从图像中分割出来,由此可一定程度上剔除掉图像的背景中所包含的大量冗余信息及噪声信息,得到主要包含待重建对象的局部图像,且该待重建对象能够尽可能的位于该局部图像的中心位置。The electronic device may first perform frame detection on the image based on the object to be reconstructed, that is, recognize the frame of the object to be reconstructed from the image. The frame is usually a rectangular frame; of course, it can also be a frame of other preset shapes, which is not limited here. The image area within this bounding box can be regarded as a region of interest. Segmenting the image to be processed based on the border can segment the region of interest from the image, thereby removing a large amount of redundant information and noise information contained in the background of the image to a certain extent, and obtaining the A partial image of the object, and the object to be reconstructed can be located at the center of the partial image as much as possible.
仅作为示例,当待重建对象为人体时,电子设备可采用人体二维关键点检测技术OpenPose对该图像进行边框检测。As an example only, when the object to be reconstructed is a human body, the electronic device may use the human body two-dimensional key point detection technology OpenPose to perform border detection on the image.
A2、将局部图像的尺寸调整至预设尺寸。A2. Adjust the size of the partial image to a preset size.
为便于后续特征图的生成,电子设备可对局部图像的尺寸进行统一:若局部图像的尺寸与预设尺寸不一致,则对该局部图像进行缩放处理,直至该局部图像的尺寸与预设尺寸一致。仅作为示例,该预设尺寸可以是:224×224,单位为像素。该局部图像的通道数通常为3,用于表示红绿蓝(Red Green Blue,RGB)三通道。也即,最终该局部图像的大小为:224×224×3。In order to facilitate the generation of subsequent feature maps, the electronic device can unify the size of the partial image: if the size of the partial image is inconsistent with the preset size, the partial image is scaled until the size of the partial image is consistent with the preset size . As an example only, the preset size may be: 224×224, and the unit is pixel. The number of channels of the partial image is usually 3, which is used to represent three channels of red, green and blue (Red Green Blue, RGB). That is, the final size of the partial image is: 224×224×3.
A3、通过采用卷积神经网络(Convolutional Neural Networks,CNN)的编码器对尺寸调整后的局部图像进行特征提取,得到特征向量。A3. By using a convolutional neural network (Convolutional Neural Networks, CNN) encoder to perform feature extraction on the resized local image to obtain a feature vector.
电子设备可预先在给定的数据集上对卷积神经网络进行分类任务的预训练,其预训练过程可参照当前对神经网络的通用训练过程,此处不再赘述。在预训练完成后,将该卷积神经网络中的分类层剔除,保留该分类层之前的特征提取层,该保留的结果即构成了编码器。The electronic device can pre-train the convolutional neural network for classification tasks on a given data set in advance, and the pre-training process can refer to the current general training process for neural networks, which will not be repeated here. After the pre-training is completed, the classification layer in the convolutional neural network is removed, and the feature extraction layer before the classification layer is retained, and the retained result constitutes the encoder.
仅作为示例,该卷积神经网络可以为ResNet50,该给定的数据集可以是ImageNet数据集,该卷积神经网络的编码器的输出为2048维的特征向量。As an example only, the convolutional neural network may be ResNet50, the given data set may be the ImageNet data set, and the output of the encoder of the convolutional neural network is a 2048-dimensional feature vector.
步骤102,根据特征向量及针对待重建对象的预设模板,生成特征图。 Step 102, generate a feature map according to the feature vector and the preset template for the object to be reconstructed.
在本申请实施例中,预设模板用于表征待重建对象的三维结构信息,具体为待重建对象在指定姿态下的三维结构信息。根据待重建对象的不同类型,设定有不同的预设模板。仅作为示例,当待重建对象为人体时,预设模板为人体网格图;当待重建对象为手部时,预设模板为手部网格图。In the embodiment of the present application, the preset template is used to characterize the three-dimensional structure information of the object to be reconstructed, specifically the three-dimensional structure information of the object to be reconstructed in a specified pose. According to different types of objects to be reconstructed, different preset templates are set. As an example only, when the object to be reconstructed is a human body, the preset template is a human body mesh; when the object to be reconstructed is a hand, the preset template is a hand mesh.
在一些实施例中,考虑到图卷积神经网络是为了更好地处理图数据结构而提出的一种神经网络结构,为提升图卷积神经网络模型进行三维重建的效率,电子设备可将该预设模板以图结构进行表达,并与特征向量结合以生成特征图。由于预设模板通常为网格图,其可被视作一由顶点集及边集所构成的无向图,可表示为G=(V,E);其中,G表示网格图;V表示顶点集;E表示边集。又由于网格图中的网格是由一个个三角形面拼接而成,因而,该网格图还可被视作一由顶点集及三角形面集所构成的无向图,可表示为M=(V,F);其中,M表示网格图的顶点信息矩阵;V表示顶点集;F表示三角形面集,该三角形面集F中的每个三角形面是由顶点集V中的三个顶点所组成的三角形而构成。可以理解,该网格图的边集的信息实际上包含在了三角形面集中。In some embodiments, considering that the graph convolutional neural network is a neural network structure proposed to better handle the graph data structure, in order to improve the efficiency of the three-dimensional reconstruction of the graph convolutional neural network model, the electronic device can use the Preset templates are expressed in a graph structure and combined with feature vectors to generate feature maps. Since the preset template is usually a grid graph, it can be regarded as an undirected graph composed of vertex sets and edge sets, which can be expressed as G=(V,E); where G represents a grid graph; V represents Vertex set; E means edge set. And because the grid in the grid graph is composed of triangular faces, the grid graph can also be regarded as an undirected graph composed of vertex sets and triangular face sets, which can be expressed as M= (V, F); where M represents the vertex information matrix of the mesh graph; V represents the vertex set; F represents the triangular face set, and each triangular face in the triangular face set F is composed of three vertices in the vertex set V composed of triangles. It can be understood that the edge set information of the mesh graph is actually included in the triangle face set.
在一种应用场景下,待重建对象为人体,预设模板为人体网格图,则该步骤101可以包括:In an application scenario, the object to be reconstructed is a human body, and the preset template is a human body grid map, then step 101 may include:
B1、基于人体网格图构建预设格式的图结构。B1. Construct a graph structure in a preset format based on the human body mesh graph.
仅作为示例,电子设备针对人体所采用的人体网格图可以为SMPL(Skinned Multi-Person Linear)模型所定义的标准模板。如图2所示,该人体网格图表示了T姿态(T-Pose)下的人体三维网格。As an example only, the human body mesh map adopted by the electronic device for the human body may be a standard template defined by the SMPL (Skinned Multi-Person Linear) model. As shown in FIG. 2 , the human body mesh diagram represents the three-dimensional mesh of the human body under a T-pose (T-Pose).
通过前文所示出的通用格式的图结构,电子设备可首先将该人体网格图表示为M smpl=(V,F),其中V代表人体网格图的顶点集,一共有6890个顶点;F代表人体网格图的三角形面集,三个顶点构成一个三角形面。对该通用格式的图结构M smpl=(V,F)进行转换,即可得到预设格式的图结构表达M smpl=(V,A);其中,V仍然代表顶点集;A代表人体网格图的邻接矩阵,A∈{0,1} 6890×6890,用以表示该邻接矩阵中元素的取值为0或1,具体为:如果第i个顶点和第j个顶点相连,则(A) ij=1,否则(A) ij=0。电子设备通过该预设格式的图结构来表达待重建对象的预设模板,为后续使用图卷积神经网络预测人体网格的三维坐标提供了依据。 Through the graph structure of the general format shown above, the electronic device can first express the human body mesh graph as M smpl = (V, F), where V represents the vertex set of the human body mesh graph, and there are 6890 vertices in total; F represents the triangular face set of the human body mesh graph, and three vertices form a triangular face. Convert the graph structure M smpl =(V,F) in the general format to obtain the graph structure expression M smpl =(V,A) in the preset format; where, V still represents the vertex set; A represents the human body mesh The adjacency matrix of the graph, A∈{0,1} 6890×6890 , is used to indicate that the value of the elements in the adjacency matrix is 0 or 1, specifically: if the i-th vertex is connected to the j-th vertex, then (A ) ij =1, otherwise (A) ij =0. The electronic device expresses the preset template of the object to be reconstructed through the graph structure of the preset format, which provides a basis for the subsequent use of the graph convolutional neural network to predict the three-dimensional coordinates of the human body mesh.
在一些实施例中,为了降低后续图卷积操作的计算复杂度,电子设备可以对该SMPL模型所定义的标准模板进行4倍下采样,并将下采样的结果作为预设模板(也即人体网格图)。可以理解,4倍下采样后所得的人体网格图的顶点数量降低至1723个。后续只需将图卷积神经网络所输出的三维重建结果进行4倍上采样,即可得到最终的三维重建结果,完成重建任务。In some embodiments, in order to reduce the computational complexity of subsequent graph convolution operations, the electronic device can perform 4 times downsampling on the standard template defined by the SMPL model, and use the downsampling result as a preset template (that is, the human body grid map). It can be understood that the number of vertices of the human body mesh image obtained after 4 times downsampling is reduced to 1723. In the future, it is only necessary to perform 4 times upsampling on the 3D reconstruction results output by the graph convolutional neural network to obtain the final 3D reconstruction results and complete the reconstruction task.
针对下采样后的人体网格图,最终所得的预设格式的图结构表示如下:For the downsampled human body mesh image, the final graph structure in the preset format is expressed as follows:
M h=(V h,A h),V h∈R 1723×3,A h∈{0,1} 1723×1723 M h =(V h ,A h ),V h ∈R 1723×3 ,A h ∈{0,1} 1723×1723
其中,M h表示下采样后的人体网格图的顶点信息矩阵;V h表示下采样后的人体网格图的顶点集;V h∈R 1723×3表示该顶点集所包含的1723个顶点的三维坐标的坐标值均为实数;A h表示下采样后的人体网格图的邻接矩阵,该邻接矩阵的含义前文已有说明,此处不再赘述;A h∈{0,1} 1723×1723表示该邻接矩阵中元素的取值为0或1。 Among them, M h represents the vertex information matrix of the downsampled human body mesh; V h represents the vertex set of the downsampled human body mesh; V h ∈ R 1723×3 represents the 1723 vertices contained in the vertex set The coordinate values of the three-dimensional coordinates of are all real numbers; A h represents the adjacency matrix of the downsampled human body grid map, the meaning of the adjacency matrix has been explained above, and will not be repeated here; A h ∈ {0,1} 1723 ×1723 indicates that the values of the elements in the adjacency matrix are 0 or 1.
B2、将特征向量与图结构进行融合拼接,得到特征图。B2. Merging and splicing the feature vector and the graph structure to obtain the feature graph.
步骤101中所得到的2048维的特征向量为f∈R 2048,步骤B1中所得到的4倍下采样后的人体网格图的顶点信息矩阵为M h∈R 1723×3,两者进行融合拼接后即可得到特征图,该特征图即为后续图卷积神经网络的输入,可表示为F in∈R 1723×2051。可以将上述操作理解为:将该2048维的特征向量拼接到了每一个顶点上。 The 2048-dimensional feature vector obtained in step 101 is f∈R 2048 , and the vertex information matrix of the 4-fold downsampled human body mesh image obtained in step B1 is M h ∈ R 1723×3 , and the two are fused After splicing, the feature map can be obtained, which is the input of the subsequent graph convolutional neural network, which can be expressed as F in ∈ R 1723×2051 . The above operation can be understood as: splicing the 2048-dimensional feature vector to each vertex.
在另一种应用场景下,待重建对象为手部,预设模板为手部网格图,则其特征图的生成流程与步骤B1及B2相似,只是预设模板发生了更改。仅作为示例,电子设备针对手部所采用的手部网格图可以为MANO(hand Model with Articulated and Non-rigid defOrmations)模型所定义的标准模板。需要注意的是,由于该手部网格图所包含的顶点数量较少(通常为700多个),因而无需对该手部网格图进行下采样的操作。In another application scenario, where the object to be reconstructed is a hand and the preset template is a hand grid map, the process of generating the feature map is similar to steps B1 and B2, except that the preset template has been changed. As an example only, the hand grid map adopted by the electronic device for the hand may be a standard template defined by the MANO (hand Model with Articulated and Non-rigid defOrmations) model. It should be noted that since the hand grid image contains a small number of vertices (usually more than 700), there is no need to perform down-sampling operations on the hand grid image.
综上,该特征图实际上结合了预设模板的顶点位置信息及图像中所表现的待重建部位的形状特征信息。In summary, the feature map actually combines the vertex position information of the preset template and the shape feature information of the part to be reconstructed represented in the image.
步骤103,将特征图输入至已训练的图卷积神经网络中,得到待重建对象的三维重建结果。 Step 103, input the feature map into the trained graph convolutional neural network to obtain the 3D reconstruction result of the object to be reconstructed.
在本申请实施例中,该图卷积神经网络将以特征图作为输入,最后输出变换后的待重建对象的网格顶点位置信息,以此作为三维重建结果。下面对该图卷积神经网络的总体结构进行介绍:In the embodiment of the present application, the graph convolutional neural network will use the feature map as input, and finally output the transformed grid vertex position information of the object to be reconstructed as the 3D reconstruction result. The following is an introduction to the overall structure of the graph convolutional neural network:
该图卷积神经网络包括有N个串联的功能模块;其中,第1个功能模块的输入为图卷积神经网络的输入,第N个功能模块的输出为图卷积神经网络的输出,且N为大于2的整数。可以理解,第1个功能模块主要用于接收该图卷积神经网络的输入,第i个功能模块主要用于进行数据的计算及传输操作,第N个功能模块主要用于输出最后所预测的待重建对象的网格顶点位置信息,其中,i为大于1小于N的整数。The graph convolutional neural network includes N functional modules in series; wherein, the input of the first functional module is the input of the graph convolutional neural network, and the output of the Nth functional module is the output of the graph convolutional neural network, and N is an integer greater than 2. It can be understood that the first functional module is mainly used to receive the input of the graph convolutional neural network, the i-th functional module is mainly used for data calculation and transmission operations, and the N-th functional module is mainly used to output the final predicted Grid vertex position information of the object to be reconstructed, where i is an integer greater than 1 and less than N.
具体地,每个功能模块中均包括如下三种基本单元,分别为:卷积单元、归一化单元及激活函数单元。下面对各个功能模块的具体结构进行介绍:Specifically, each functional module includes the following three basic units, namely: convolution unit, normalization unit and activation function unit. The specific structure of each functional module is introduced as follows:
请参阅图3,图3给出了第1个功能模块的结构示意。对于第1个功能模块来说,其包括有至少三个指定结构(图3中仅示出3个),且该至少三个指定结构依次串联,该指定结构包括依次串联的卷积单元、归一化单元及激活函数单元;在该至少三个指定结构中:第一个指定结构的输入为第1个功能模块的输入(也即图卷积神经网络的输入);最后一个指定结构的输出与第1个功能模块的输入(也即图卷积神经网络的输入)之间的残差为第1个功能模块的输出。Please refer to Fig. 3, Fig. 3 shows the schematic structure of the first functional module. For the first functional module, it includes at least three specified structures (only three are shown in Figure 3), and the at least three specified structures are connected in series, and the specified structures include convolution units, normalization units, and A unit and an activation function unit; in the at least three specified structures: the input of the first specified structure is the input of the first functional module (that is, the input of the graph convolutional neural network); the output of the last specified structure The residual between the input of the first functional module (that is, the input of the graph convolutional neural network) is the output of the first functional module.
请参阅图4,图4给出了第i个功能模块的结构示意。对于第i个功能模块来说,其包括有至少两个指定结构(图4中仅示出2个),且该至少两个指定结构依次串联,该指定结构与第1个功能模块的指定结构相同,此处不再赘述。在该至少两个指定结构中:第一个指定结构的输入为第i-1个功能模块的输出,最后一个指定结构的输出与第i-1个功能模块的输出之间的残差为该第i个功能模块的输出。Please refer to FIG. 4. FIG. 4 shows a structural diagram of the i-th functional module. For the i-th functional module, it includes at least two specified structures (only 2 are shown in Figure 4), and the at least two specified structures are connected in sequence, and the specified structure is the same as the specified structure of the first functional module Same, no more details here. In the at least two specified structures: the input of the first specified structure is the output of the i-1th functional module, and the residual between the output of the last specified structure and the output of the i-1th functional module is the The output of the i-th functional module.
请参阅图5,图5给出了第N个功能模块的结构示意。对于第N个功能模块来说,其包括依次串联的卷积单元、归一化单元、激活函数单元和卷积单元;其中,第一个卷积单 元的输入为第N-1个功能模块的输出,第二个卷积单元的输出与第N-1个功能模块的输出之间的残差为该第N个功能模块的输出。可以理解的是,由于该第N个功能模块的作用是输出最后所预测的待重建对象的网格顶点位置信息,因而,该第N个功能模块的第二个卷积单元所输出的数据不再需要进行数据归一化处理及激活函数处理。Please refer to FIG. 5, which shows a schematic structural diagram of the Nth functional module. For the Nth functional module, it includes a convolution unit, a normalization unit, an activation function unit, and a convolutional unit connected in series; where the input of the first convolutional unit is the N-1th functional module. output, the residual between the output of the second convolution unit and the output of the N-1th functional module is the output of the Nth functional module. It can be understood that since the function of the Nth functional module is to output the last predicted grid vertex position information of the object to be reconstructed, the data output by the second convolution unit of the Nth functional module is not Data normalization processing and activation function processing are then required.
请参阅图6,图6给出了包括4个功能模块的图卷积神经网络的整体结构示例。为便于理解,可采用参数f in及参数f out来表示每个功能模块在进行图卷积运算的过程中,特征维度大小的变化。以图6为例,其第1个功能模块中包括3个卷积单元,则其特征维度大小的变化可表示为(f in,f out,f out2,f out3)的形式,其中,f in是初始输入的特征维度大小,f out1、f out2及f out3分别是三个卷积单元输出的特征维度大小,由于归一化单元及激活函数单元并不改变特征维度大小,因而输入第二个卷积单元的特征维度大小与第一个卷积单元输出的特征维度大小相同,其它依次类推。仍以图6为例,其第2、3及4个功能模块中均包括2个卷积单元,则其特征维度大小的变化可表示为(f in,f out1,f out2)的形式,其参数含义可参阅前文,此处不再赘述。 Please refer to Figure 6, which shows an example of the overall structure of a graph convolutional neural network including 4 functional modules. For ease of understanding, the parameter f in and the parameter f out can be used to represent the change in the size of the feature dimension of each functional module during the graph convolution operation. Taking Figure 6 as an example, the first functional module includes 3 convolution units, then the change of its feature dimension size can be expressed as (f in , f out , f out2 , f out3 ), where f in is the feature dimension size of the initial input, f out1 , f out2 and f out3 are the feature dimension sizes output by the three convolution units respectively, since the normalization unit and the activation function unit do not change the feature dimension size, so input the second The size of the feature dimension of the convolution unit is the same as the size of the feature dimension output by the first convolution unit, and so on. Still taking Figure 6 as an example, its 2nd, 3rd and 4th functional modules all include 2 convolution units, then the change of its feature dimension size can be expressed in the form of (f in , f out1 , f out2 ), where For the meaning of the parameters, please refer to the previous article, and will not repeat them here.
仅作为示例,在待重建对象为人体的情况下,所得特征图F in∈R 1723×2051。该特征图经过该图卷积神经网络,可以得到该图卷积神经网络的最后输出F out∈R 1723×3。在经过上采样后,获得M out∈R 6890×3,即得到了最终针对人体的三维重建结果。 As an example only, when the object to be reconstructed is a human body, the obtained feature map F in ∈R 1723×2051 . The feature map passes through the graph convolutional neural network, and the final output F outR 1723×3 of the graph convolutional neural network can be obtained. After up-sampling, M out ∈ R 6890×3 is obtained, that is, the final 3D reconstruction result for the human body is obtained.
在一种示例中,该卷积单元可以是切比雪夫卷积单元,该切比雪夫卷积单元具体采用切比雪夫多项式来构造切比雪夫卷积算法。通过该切比雪夫卷积单元,可以一定程度上加快图卷积神经网络的处理速度,提升三维重建的效率。In an example, the convolution unit may be a Chebyshev convolution unit, and the Chebyshev convolution unit specifically uses Chebyshev polynomials to construct a Chebyshev convolution algorithm. Through the Chebyshev convolution unit, the processing speed of the graph convolutional neural network can be accelerated to a certain extent, and the efficiency of 3D reconstruction can be improved.
其中,该切比雪夫多项式的表示如下:Among them, the expression of the Chebyshev polynomial is as follows:
T 0(x)=1;T 1(x)=x;T n+1(x)=2xT n(x)-T n-1(x) T 0 (x)=1; T 1 (x)=x; T n+1 (x)=2xT n (x)-T n-1 (x)
基于该切比雪夫多项式,可得到切比雪夫卷积算法,表示如下:Based on the Chebyshev polynomial, the Chebyshev convolution algorithm can be obtained, expressed as follows:
Figure PCTCN2021113308-appb-000001
Figure PCTCN2021113308-appb-000001
其中,F in∈R N×fin表示输入特征。 Among them, F inR N×fin represents the input features.
F out∈R N×fout表示输出特征。 F out ∈ R N×fout represents the output feature.
K表示使用K阶切比雪夫多项式,在本申请实施例中,图卷积神经网络的每个切比雪夫卷积单元均取K=3。K represents the use of K-order Chebyshev polynomials. In the embodiment of the present application, each Chebyshev convolution unit of the graph convolutional neural network takes K=3.
θ k∈R fin×fout表示特征变化矩阵,里面的参数即为图卷积神经网络需要学习的值。 θ k ∈ R fin × fout represents the feature change matrix, and the parameters inside are the values that the graph convolutional neural network needs to learn.
Figure PCTCN2021113308-appb-000002
表示预设模板的标度拉普拉斯矩阵。在待重建对象为人体,采用的预设模板为SMPL模型所定义的标准模板的下采样结果时,N为下采样后的顶点数量1723。该标度 拉普拉斯矩阵具体为:
Figure PCTCN2021113308-appb-000002
A scaled Laplacian matrix representing the preset template. When the object to be reconstructed is a human body and the preset template used is the downsampling result of the standard template defined by the SMPL model, N is the number of vertices after downsampling, 1723. The scaled Laplacian matrix is specifically:
Figure PCTCN2021113308-appb-000003
Figure PCTCN2021113308-appb-000003
Figure PCTCN2021113308-appb-000004
Figure PCTCN2021113308-appb-000004
其中,I为一个单位矩阵,(D h) ij=Σ j(A h) ij是一个对角矩阵,λ max是L p矩阵的最大特征值。 Among them, I is an identity matrix, (D h ) ijj (A h ) ij is a diagonal matrix, and λ max is the largest eigenvalue of the L p matrix.
为便于理解,以K=3为例,将上面所给出的切比雪夫卷积算法展开,表示如下:For ease of understanding, taking K=3 as an example, the Chebyshev convolution algorithm given above is expanded as follows:
Figure PCTCN2021113308-appb-000005
Figure PCTCN2021113308-appb-000005
Figure PCTCN2021113308-appb-000006
Figure PCTCN2021113308-appb-000006
Figure PCTCN2021113308-appb-000007
Figure PCTCN2021113308-appb-000007
其中,L为中间参数,不具备实际物理含义。W即为图卷积神经网络所需要学习的参数。Among them, L is an intermediate parameter, which has no actual physical meaning. W is the parameter that the graph convolutional neural network needs to learn.
在一些实施例中,在待重建对象为人体的情况下,图卷积神经网络可使用Human3.6M和MSCOCO这两种数据集进行训练。具体地,由于这两种数据集并未存储有各个训练样本中的真实人体网格,仅存储有真实人体三维关节的位置信息,因而,需要预先基于各个训练样本的真实人体三维关节的位置信息拟合得到高精度的真实人体网格,该真实人体网格即可作为强标签,投入至该图卷积神经网络的训练过程中进行使用。也即,这里所说的真实人体网格,实际为基于真实人体三维关节所拟合出的高精度结果。可以理解,该图卷积神经网络的训练过程与一般的神经网络的训练过程基本无异,只是采用了新的损失函数,使得训练出的图卷积神经网络模型所输出的三维重建结果更加光滑完整,其实用性也更高。该损失函数为:In some embodiments, when the object to be reconstructed is a human body, the graph convolutional neural network can be trained using two datasets, Human3.6M and MSCOCO. Specifically, since these two data sets do not store the real human body mesh in each training sample, but only store the position information of the real human body's 3D joints, therefore, it is necessary to pre-based on the position information of the real human body's 3D joints of each training sample The high-precision real human body mesh is obtained by fitting, and the real human body mesh can be used as a strong label and put into the training process of the convolutional neural network of the image for use. That is to say, the real human body mesh mentioned here is actually a high-precision result fitted based on the three-dimensional joints of the real human body. It can be understood that the training process of this graph convolutional neural network is basically the same as that of a general neural network, except that a new loss function is used to make the 3D reconstruction results output by the trained graph convolutional neural network model smoother. Complete, its practicality is also higher. The loss function is:
loss=λ aL vbL jcL ndL e loss=λ a L vb L jc L nd L e
其中,λ a、λ b、λ c及λ d均为超参数。 Among them, λ a , λ b , λ c and λ d are hyperparameters.
其中,L v表示网格损失,该网格损失用于描述真实人体网格与预测人体网格之间的位置差异。以M *表示真实人体网格的各个顶点的位置,M表示预测人体网格的各个顶点的位置,使用L1损失,则该网格损失L v表示如下: where Lv denotes the mesh loss, which is used to describe the positional difference between the real body mesh and the predicted body mesh. Let M * represent the position of each vertex of the real human body mesh, M represent the position of each vertex of the predicted human body mesh, and use L1 loss, then the mesh loss L v is expressed as follows:
L v=||M-M *|| 1 L v =||MM * || 1
其中,L j表示的是三维关节损失,用于描述真实人体三维关节与预测人体三维关节之 间的位置差异。以J 3D*表示真实人体三维关节的位置,JM表示预测人体关节位置,J∈R v×N表示从人体网格中提取出关节的矩阵,M表示预测人体网格的各个顶点的位置,使用L1损失,则该三维关节损失表示如下: Among them, L j represents the 3D joint loss, which is used to describe the position difference between the real human 3D joints and the predicted human 3D joints. Let J 3D* represent the position of the real three-dimensional joints of the human body, JM represents the predicted position of the human body joints, J∈R v×N represents the matrix of joints extracted from the human body mesh, and M represents the position of each vertex of the predicted human body mesh, using L1 loss, the three-dimensional joint loss is expressed as follows:
L j=||JM-J 3D*|| 1 L j =||JM-J 3D* || 1
其中,L n表示表面法向损失,用于描述真实人体网格的三角形面的法向量与预测人体网格的三角形面的法向量之间的角度差异。以f表示预测人体网格的三角形面,n f *表示真实人体网格中与f对应的三角形面的单位法向量,m i和m j分别表示f中的两个顶点坐标,则该表面法向损失L n表示如下: where L n represents the surface normal loss, which is used to describe the angle difference between the normal vectors of the triangular faces of the real human mesh and the normal vectors of the triangular faces of the predicted human mesh. Let f represent the triangular surface of the predicted human body mesh, n f * represents the unit normal vector of the triangular surface corresponding to f in the real human body mesh, m i and m j represent the coordinates of two vertices in f respectively, then the surface method The direction loss Ln is expressed as follows:
Figure PCTCN2021113308-appb-000008
Figure PCTCN2021113308-appb-000008
其中,L e表示表面边缘损失,用于描述真实人体网格的三角形面的边长与预测人体网格的三角形面的边长之间的长度差异。以f表示预测人体网格的三角形面,m i和m j分别表示f中的两个顶点坐标,m i *和m j *分别表示真实人体网格中与m i和m j对应的顶点坐标,则该表面边缘损失L e表示如下: where L e represents the surface edge loss, which is used to describe the length difference between the edge lengths of the triangular faces of the real human mesh and the edge lengths of the triangular faces of the predicted human mesh. Let f represent the triangular surface of the predicted human body mesh, m i and m j represent the two vertex coordinates in f respectively, and m i * and m j * represent the vertex coordinates corresponding to m i and m j in the real human body mesh respectively , then the surface edge loss L e is expressed as follows:
Figure PCTCN2021113308-appb-000009
Figure PCTCN2021113308-appb-000009
为便于理解,请参阅图7,图7以待重建部位为人体为例,给出了本申请实施例中的三维重建方法的工作框架示例。该工作框架由两部分组成,分别是基于卷积神经网络的编码器和基于图卷积神经网络的人体三维顶点回归器。在对某一人物进行拍摄后获得人体的原始图像,该原始图像作为初始的输入,通过预处理后得到局部图像,该局部图像通过基于卷积神经网络的编码器被编码成一组特征向量,该组特征向量与预设的人体网格图中的网格顶点位置信息进行融合拼接,形成一特征图作为图卷积神经网络的输入,最后该图卷积神经网络会回归出一组新的网格顶点位置信息,使其符合原始图像对人体的二维观察,完成人体的三维重建任务。For ease of understanding, please refer to FIG. 7 . FIG. 7 shows a working frame example of the three-dimensional reconstruction method in the embodiment of the present application by taking the human body as an example to be reconstructed. The working framework consists of two parts, a convolutional neural network-based encoder and a graph convolutional neural network-based human 3D vertex regressor. After shooting a person, the original image of the human body is obtained. The original image is used as the initial input, and the partial image is obtained after preprocessing. The partial image is encoded into a set of feature vectors through an encoder based on a convolutional neural network. The set of feature vectors are fused and spliced with the grid vertex position information in the preset human grid graph to form a feature graph as the input of the graph convolutional neural network, and finally the graph convolutional neural network will return a new set of network The position information of the vertices of the lattice makes it conform to the two-dimensional observation of the human body in the original image, and completes the task of three-dimensional reconstruction of the human body.
由上可见,通过本申请实施例,首先对待重建对象的图像进行特征提取,得到用于表征待重建对象的形状特征信息的特征向量,然后将该特征向量与针对该待重建对象的预设模板相结合来生成特征图,最后将该特征图输入至已训练的图卷积神经网络中,以此得到该待重建对象的三维重建结果。上述过程中,通过特征向量与预设模板的结合,使得最终所生成的特征图在包含有待重建对象的形状特征的基础上,还获得了预设模板所展示的待重建对象的三维结构信息,由此一来,已训练的图卷积神经网络可更好地处理该特征图,所获得的三维重建结果也会更为准确,保障了三维重建的稳定性。It can be seen from the above that, through the embodiment of the present application, feature extraction is first performed on the image of the object to be reconstructed, and the feature vector used to characterize the shape feature information of the object to be reconstructed is obtained, and then the feature vector is combined with the preset template for the object to be reconstructed Combined to generate a feature map, and finally input the feature map into the trained graph convolutional neural network, so as to obtain the 3D reconstruction result of the object to be reconstructed. In the above process, through the combination of the feature vector and the preset template, the finally generated feature map not only contains the shape features of the object to be reconstructed, but also obtains the three-dimensional structure information of the object to be reconstructed displayed by the preset template, As a result, the trained graph convolutional neural network can better process the feature map, and the obtained 3D reconstruction results will be more accurate, ensuring the stability of 3D reconstruction.
对应于上文所提供的三维重建方法,本申请实施例还提供了一种三维重建装置。如图8所示,该三维重建装置800包括:Corresponding to the three-dimensional reconstruction method provided above, an embodiment of the present application further provides a three-dimensional reconstruction device. As shown in Figure 8, the three-dimensional reconstruction device 800 includes:
提取模块801,用于对待重建对象的图像进行特征提取,得到特征向量,其中,上述特征向量用于表征上述待重建对象的形状特征信息;The extraction module 801 is configured to perform feature extraction on the image of the object to be reconstructed to obtain a feature vector, wherein the feature vector is used to represent the shape feature information of the object to be reconstructed;
生成模块802,用于根据上述特征向量及针对上述待重建对象的预设模板,生成特征图,上述预设模板用于表征上述待重建对象的三维结构信息;The generation module 802 is configured to generate a feature map according to the above-mentioned feature vector and a preset template for the above-mentioned object to be reconstructed, and the above-mentioned preset template is used to represent the three-dimensional structure information of the above-mentioned object to be reconstructed;
重建模块803,用于将上述特征图输入至已训练的图卷积神经网络中,得到上述待重建对象的三维重建结果。The reconstruction module 803 is configured to input the above-mentioned feature map into the trained graph convolutional neural network to obtain the three-dimensional reconstruction result of the above-mentioned object to be reconstructed.
可选地,上述图卷积神经网络包括N个串联的功能模块;Optionally, the graph convolutional neural network includes N functional modules connected in series;
其中,第1个上述功能模块的输入为上述图卷积神经网络的输入,第N个上述功能模块的输出为上述图卷积神经网络的输出,上述N为大于2的整数;Wherein, the input of the first above-mentioned functional module is the input of the above-mentioned graph convolutional neural network, the output of the Nth above-mentioned functional module is the output of the above-mentioned graph convolutional neural network, and the above-mentioned N is an integer greater than 2;
其中,上述功能模块包括卷积单元、归一化单元及激活函数单元。Wherein, the above functional modules include a convolution unit, a normalization unit and an activation function unit.
可选地,第1个上述功能模块包括至少三个指定结构,上述至少三个指定结构依次串联,上述指定结构包括依次串联的上述卷积单元、上述归一化单元及上述激活函数单元;Optionally, the first above-mentioned functional module includes at least three specified structures, and the above-mentioned at least three specified structures are connected in sequence, and the above-mentioned specified structure includes the above-mentioned convolution unit, the above-mentioned normalization unit and the above-mentioned activation function unit connected in sequence;
在上述至少三个指定结构中:第一个指定结构的输入为第1个上述功能模块的输入,最后一个指定结构的输出与第1个上述功能模块的输入之间的残差为第1个上述功能模块的输出。Among the at least three specified structures above: the input of the first specified structure is the input of the first above-mentioned function module, and the residual between the output of the last specified structure and the input of the first above-mentioned function module is the first The output of the above function modules.
可选地,第i个上述功能模块包括至少两个指定结构,上述至少两个指定结构依次串联,上述指定结构包括依次串联的上述卷积单元、上述归一化单元及上述激活函数单元,上述i为大于1且小于N的整数;Optionally, the i-th above-mentioned functional module includes at least two specified structures, and the above-mentioned at least two specified structures are connected in series, and the above-mentioned specified structure includes the above-mentioned convolution unit, the above-mentioned normalization unit and the above-mentioned activation function unit connected in series, and the above-mentioned i is an integer greater than 1 and less than N;
在上述至少两个指定结构中:第一个指定结构的输入为第i-1个上述功能模块的输出,最后一个指定结构的输出与第i-1个上述功能模块的输出之间的残差为第i个上述功能模块的输出。Among the above at least two specified structures: the input of the first specified structure is the output of the i-1th above-mentioned functional module, the residual between the output of the last specified structure and the output of the i-1th above-mentioned functional module is the output of the i-th above-mentioned functional module.
可选地,第N个上述功能模块包括依次串联的上述卷积单元、上述归一化单元、上述激活函数单元和上述卷积单元;Optionally, the Nth above-mentioned functional module includes the above-mentioned convolution unit, the above-mentioned normalization unit, the above-mentioned activation function unit, and the above-mentioned convolution unit connected in series;
在第N个上述功能模块中:第一个上述卷积单元的输入为第N-1个上述功能模块的输出,第二个上述卷积单元的输出与第N-1个上述功能模块的输出之间的残差为第N个上述功能模块的输出。In the Nth above-mentioned functional module: the input of the first above-mentioned convolution unit is the output of the N-1th above-mentioned functional module, the output of the second above-mentioned convolutional unit is the output of the N-1th above-mentioned functional module The residual between is the output of the Nth above-mentioned functional module.
可选地,上述卷积单元为切比雪夫卷积单元。Optionally, the above convolution unit is a Chebyshev convolution unit.
可选地,当上述待重建对象为人体时,上述预设模板为人体网格图;上述生成模块802,包括:Optionally, when the above-mentioned object to be reconstructed is a human body, the above-mentioned preset template is a human body grid map; the above-mentioned generating module 802 includes:
构建单元,用于基于上述人体网格图构建预设格式的图结构,上述图结构包含上述人 体网格图的顶点信息;A construction unit, configured to construct a graph structure in a preset format based on the above-mentioned human body mesh graph, where the graph structure includes the vertex information of the above-mentioned human body mesh graph;
拼接单元,用于将上述特征向量与上述图结构进行融合拼接,得到上述特征图。The splicing unit is configured to fuse and splice the above-mentioned feature vectors and the above-mentioned graph structure to obtain the above-mentioned feature map.
可选地,上述三维重建装置800还包括:Optionally, the above three-dimensional reconstruction device 800 also includes:
计算模块,用于在上述图卷积神经网络的训练过程中,基于网格损失、三维关节损失、表面法向损失及表面边缘损失计算上述图卷积神经网络的总损失;A calculation module for calculating the total loss of the above-mentioned graph convolutional neural network based on grid loss, three-dimensional joint loss, surface normal loss and surface edge loss during the training process of the above-mentioned graph convolutional neural network;
其中,上述网格损失用于描述真实人体网格与预测人体网格之间的位置差异;Among them, the above mesh loss is used to describe the position difference between the real human body mesh and the predicted human body mesh;
上述三维关节损失用于描述真实人体三维关节与预测人体三维关节之间的位置差异;The above three-dimensional joint loss is used to describe the position difference between the real human three-dimensional joints and the predicted human three-dimensional joints;
上述表面法向损失用于描述真实人体网格的三角形面的法向量与预测人体网格的三角形面的法向量之间的角度差异;The surface normal loss described above is used to describe the angular difference between the normal vectors of the triangular faces of the real human mesh and the normal vectors of the triangular faces of the predicted human mesh;
上述表面边缘损失用于描述真实人体网格的三角形面的边长与预测人体网格的三角形面的边长之间的长度差异。The surface edge loss described above is used to describe the difference in length between the edge lengths of the triangular faces of the real human mesh and the edge lengths of the triangular faces of the predicted human mesh.
可选地,上述提取模块801,包括:Optionally, the above extraction module 801 includes:
分割单元,用于基于上述待重建对象对上述图像进行分割,得到局部图像;A segmentation unit, configured to segment the image based on the object to be reconstructed to obtain a partial image;
调整单元,用于将上述局部图像的尺寸调整至预设尺寸;an adjustment unit, configured to adjust the size of the partial image to a preset size;
提取单元,用于通过采用卷积神经网络的编码器对尺寸调整后的上述局部图像进行特征提取,得到上述特征向量。The extraction unit is configured to perform feature extraction on the above-mentioned partial image after size adjustment by using an encoder using a convolutional neural network to obtain the above-mentioned feature vector.
由上可见,通过本申请实施例,首先对待重建对象的图像进行特征提取,得到用于表征上述待重建对象的形状特征信息的特征向量,然后将该特征向量与针对该待重建对象的预设模板相结合来生成特征图,最后将该特征图输入至已训练的图卷积神经网络中,以此得到该待重建对象的三维重建结果。上述过程中,通过特征向量与预设模板的结合,使得最终所生成的特征图在包含有待重建对象的形状特征的基础上,还获得了预设模板所展示的待重建对象的三维结构信息,由此一来,已训练的图卷积神经网络可更好地处理该特征图,所获得的三维重建结果也会更为准确,保障了三维重建的稳定性。It can be seen from the above that, through the embodiment of the present application, feature extraction is first performed on the image of the object to be reconstructed to obtain the feature vector used to characterize the shape feature information of the object to be reconstructed, and then the feature vector is combined with the preset object for the object to be reconstructed The templates are combined to generate a feature map, and finally the feature map is input into the trained graph convolutional neural network to obtain the 3D reconstruction result of the object to be reconstructed. In the above process, through the combination of the feature vector and the preset template, the finally generated feature map not only contains the shape features of the object to be reconstructed, but also obtains the three-dimensional structure information of the object to be reconstructed displayed by the preset template, As a result, the trained graph convolutional neural network can better process the feature map, and the obtained 3D reconstruction results will be more accurate, ensuring the stability of 3D reconstruction.
对应于上文所提供的三维重建方法,本申请实施例还提供了一种电子设备。请参阅图9,本申请实施例中的电子设备9包括:存储器901,一个或多个处理器902(图9中仅示出一个)及存储在存储器901上并可在处理器上运行的计算机程序。其中:存储器901用于存储软件程序以及单元,处理器902通过运行存储在存储器901的软件程序以及单元,从而执行各种功能应用以及诊断,以获取上述预设事件对应的资源。具体地,处理器902通过运行存储在存储器901的上述计算机程序时实现以下步骤:Corresponding to the three-dimensional reconstruction method provided above, an embodiment of the present application further provides an electronic device. Referring to Fig. 9, the electronic device 9 in the embodiment of the present application includes: a memory 901, one or more processors 902 (only one is shown in Fig. 9 ) and a computer stored on the memory 901 and operable on the processor program. Wherein: the memory 901 is used to store software programs and units, and the processor 902 executes various functional applications and diagnoses by running the software programs and units stored in the memory 901 to obtain resources corresponding to the above preset events. Specifically, the processor 902 implements the following steps by running the above-mentioned computer program stored in the memory 901:
对待重建对象的图像进行特征提取,得到特征向量,其中,上述特征向量用于表征上述待重建对象的形状特征信息;performing feature extraction on the image of the object to be reconstructed to obtain a feature vector, wherein the feature vector is used to represent the shape feature information of the object to be reconstructed;
根据上述特征向量及针对上述待重建对象的预设模板,生成特征图,上述预设模板用于表征上述待重建对象的三维结构信息;Generate a feature map according to the feature vector and a preset template for the object to be reconstructed, where the preset template is used to represent the three-dimensional structure information of the object to be reconstructed;
将上述特征图输入至已训练的图卷积神经网络中,得到上述待重建对象的三维重建结果。Input the feature map above into the trained graph convolutional neural network to obtain the 3D reconstruction result of the object to be reconstructed above.
假设上述为第一种可能的实施方式,则在第一种可能的实施方式作为基础而提供的第二种可能的实施方式中,上述图卷积神经网络包括N个串联的功能模块;Assuming that the above is the first possible implementation manner, then in the second possible implementation manner provided on the basis of the first possible implementation manner, the above-mentioned graph convolutional neural network includes N series-connected functional modules;
其中,第1个上述功能模块的输入为上述图卷积神经网络的输入,第N个上述功能模块的输出为上述图卷积神经网络的输出,上述N为大于2的整数;Wherein, the input of the first above-mentioned functional module is the input of the above-mentioned graph convolutional neural network, the output of the Nth above-mentioned functional module is the output of the above-mentioned graph convolutional neural network, and the above-mentioned N is an integer greater than 2;
其中,上述功能模块包括卷积单元、归一化单元及激活函数单元。Wherein, the above functional modules include a convolution unit, a normalization unit and an activation function unit.
在上述第二种可能的实施方式作为基础而提供的第三种可能的实施方式中,第1个上述功能模块包括至少三个指定结构,上述至少三个指定结构依次串联,上述指定结构包括依次串联的上述卷积单元、上述归一化单元及上述激活函数单元;In the third possible implementation manner provided on the basis of the second possible implementation manner above, the first above-mentioned functional module includes at least three specified structures, and the at least three specified structures are connected in sequence, and the above-mentioned specified structures include sequentially The above-mentioned convolution unit, the above-mentioned normalization unit and the above-mentioned activation function unit connected in series;
在上述至少三个指定结构中:第一个指定结构的输入为第1个上述功能模块的输入,最后一个指定结构的输出与第1个上述功能模块的输入之间的残差为第1个上述功能模块的输出。Among the at least three specified structures above: the input of the first specified structure is the input of the first above-mentioned function module, and the residual between the output of the last specified structure and the input of the first above-mentioned function module is the first The output of the above function modules.
在上述第二种可能的实施方式作为基础而提供的第四种可能的实施方式中,第i个上述功能模块包括至少两个指定结构,上述至少两个指定结构依次串联,上述指定结构包括依次串联的上述卷积单元、上述归一化单元及上述激活函数单元,上述i为大于1且小于N的整数;In the fourth possible implementation mode provided on the basis of the second possible implementation mode above, the i-th functional module includes at least two specified structures, the at least two specified structures are connected in sequence, and the specified structures include sequentially The above-mentioned convolution unit, the above-mentioned normalization unit and the above-mentioned activation function unit connected in series, the above-mentioned i is an integer greater than 1 and less than N;
在上述至少两个指定结构中:第一个指定结构的输入为第i-1个上述功能模块的输出,最后一个指定结构的输出与第i-1个上述功能模块的输出之间的残差为第i个上述功能模块的输出。Among the above at least two specified structures: the input of the first specified structure is the output of the i-1th above-mentioned functional module, the residual between the output of the last specified structure and the output of the i-1th above-mentioned functional module is the output of the i-th above-mentioned functional module.
在上述第二种可能的实施方式作为基础而提供的第五种可能的实施方式中,第N个上述功能模块包括依次串联的上述卷积单元、上述归一化单元、上述激活函数单元和上述卷积单元;In the fifth possible implementation manner provided on the basis of the second possible implementation manner above, the Nth functional module includes the above-mentioned convolution unit, the above-mentioned normalization unit, the above-mentioned activation function unit and the above-mentioned convolution unit;
在第N个上述功能模块中:第一个上述卷积单元的输入为第N-1个上述功能模块的输出,第二个上述卷积单元的输出与第N-1个上述功能模块的输出之间的残差为第N个上述功能模块的输出。In the Nth above-mentioned functional module: the input of the first above-mentioned convolution unit is the output of the N-1th above-mentioned functional module, the output of the second above-mentioned convolutional unit is the output of the N-1th above-mentioned functional module The residual between is the output of the Nth above-mentioned functional module.
在上述第二种可能的实施方式作为基础,或者第三种可能的实施方式作为基础,或者第四种可能的实施方式作为基础,或者第五种可能的实施方式作为基础而提供的第六种可能的实施方式中,上述卷积单元为切比雪夫卷积单元。Based on the second possible implementation manner above, or the third possible implementation manner as the basis, or the fourth possible implementation manner as the basis, or the fifth possible implementation manner as the basis, the sixth In a possible implementation manner, the above convolution unit is a Chebyshev convolution unit.
在上述第一种可能的实施方式作为基础而提供的第七种可能的实施方式中,当上述待重建对象为人体时,上述预设模板为人体网格图;上述根据上述特征向量及针对上述待重建对象的预设模板,生成特征图,包括:In the seventh possible implementation manner provided on the basis of the first possible implementation manner above, when the object to be reconstructed is a human body, the preset template is a human body mesh; The preset template of the object to be reconstructed generates a feature map, including:
基于上述人体网格图构建预设格式的图结构,上述图结构包含上述人体网格图的顶点信息;Constructing a graph structure in a preset format based on the above-mentioned human body mesh graph, wherein the graph structure includes vertex information of the above-mentioned human body mesh graph;
将上述特征向量与上述图结构进行融合拼接,得到上述特征图。The above feature vector is fused and spliced with the above graph structure to obtain the above feature map.
在上述第七种可能的实施方式作为基础而提供的第八种可能的实施方式中,处理器902通过运行存储在存储器901的上述计算机程序时还实现以下步骤:In the eighth possible implementation manner provided on the basis of the seventh possible implementation manner above, the processor 902 further implements the following steps when running the above computer program stored in the memory 901:
在上述图卷积神经网络的训练过程中,基于网格损失、三维关节损失、表面法向损失及表面边缘损失计算上述图卷积神经网络的总损失;During the training process of the above-mentioned graph convolutional neural network, the total loss of the above-mentioned graph convolutional neural network is calculated based on grid loss, three-dimensional joint loss, surface normal loss and surface edge loss;
其中,上述网格损失用于描述真实人体网格与预测人体网格之间的位置差异;Among them, the above mesh loss is used to describe the position difference between the real human body mesh and the predicted human body mesh;
上述三维关节损失用于描述真实人体三维关节与预测人体三维关节之间的位置差异;The above three-dimensional joint loss is used to describe the position difference between the real human three-dimensional joints and the predicted human three-dimensional joints;
上述表面法向损失用于描述真实人体网格的三角形面的法向量与预测人体网格的三角形面的法向量之间的角度差异;The surface normal loss described above is used to describe the angular difference between the normal vectors of the triangular faces of the real human mesh and the normal vectors of the triangular faces of the predicted human mesh;
上述表面边缘损失用于描述真实人体网格的三角形面的边长与预测人体网格的三角形面的边长之间的长度差异。The surface edge loss described above is used to describe the difference in length between the edge lengths of the triangular faces of the real human mesh and the edge lengths of the triangular faces of the predicted human mesh.
在上述第一种可能的实施方式作为基础而提供的第九种可能的实施方式中,上述对待重建对象的图像进行特征提取,得到特征向量,包括:In the ninth possible implementation manner provided on the basis of the first possible implementation manner above, feature extraction is performed on the image of the object to be reconstructed to obtain a feature vector, including:
基于上述待重建对象对上述图像进行分割,得到局部图像;Segmenting the image above based on the object to be reconstructed to obtain a partial image;
将上述局部图像的尺寸调整至预设尺寸;Adjust the size of the above partial image to a preset size;
通过采用卷积神经网络的编码器对尺寸调整后的上述局部图像进行特征提取,得到上述特征向量。The above-mentioned feature vector is obtained by performing feature extraction on the above-mentioned partial image after size adjustment by using an encoder of a convolutional neural network.
应当理解,在本申请实施例中,所称处理器902可以是中央处理单元(Central Processing Unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(Digital Signal Processor, DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that in the embodiment of the present application, the so-called processor 902 may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP) , Application Specific Integrated Circuit (ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
存储器901可以包括只读存储器和随机存取存储器,并向处理器902提供指令和数据。存储器901的一部分或全部还可以包括非易失性随机存取存储器。例如,存储器901还可以存储设备类别的信息。The memory 901 may include read-only memory and random-access memory, and provides instructions and data to the processor 902 . Part or all of the memory 901 may also include non-volatile random access memory. For example, the memory 901 may also store information of device categories.
由上可见,通过本申请实施例,首先对待重建对象的图像进行特征提取,得到用于表征上述待重建对象的形状特征信息的特征向量,然后会将该特征向量与针对该待重建对象的预设模板相结合来生成特征图,最后将该特征图输入至已训练的图卷积神经网络中,以此得到该待重建对象的三维重建结果。上述过程中,通过特征向量与预设模板的结合,使得最终所生成的特征图在包含有待重建对象的形状特征的基础上,还获得了预设模板所展示的待重建对象的三维结构信息,由此一来,已训练的图卷积神经网络可更好地处理该特征图,所获得的三维重建结果也会更为准确,保障了三维重建的稳定性。It can be seen from the above that, through the embodiment of the present application, feature extraction is first performed on the image of the object to be reconstructed to obtain the feature vector used to characterize the shape feature information of the object to be reconstructed, and then the feature vector will be combined with the predicted object for the object to be reconstructed. The templates are combined to generate a feature map, and finally the feature map is input into the trained graph convolutional neural network to obtain the 3D reconstruction result of the object to be reconstructed. In the above process, through the combination of the feature vector and the preset template, the finally generated feature map not only contains the shape features of the object to be reconstructed, but also obtains the three-dimensional structure information of the object to be reconstructed displayed by the preset template, As a result, the trained graph convolutional neural network can better process the feature map, and the obtained 3D reconstruction results will be more accurate, ensuring the stability of 3D reconstruction.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将上述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of description, only the division of the above-mentioned functional units and modules is used for illustration. In practical applications, the above-mentioned functions can be assigned to different functional units, Module completion means that the internal structure of the above-mentioned device is divided into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment may be integrated into one processing unit, or each unit may exist separately physically, or two or more units may be integrated into one unit, and the above-mentioned integrated units may adopt hardware It can also be implemented in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present application. For the specific working process of the units and modules in the above system, reference may be made to the corresponding process in the foregoing method embodiments, and details will not be repeated here.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。In the above-mentioned embodiments, the descriptions of each embodiment have their own emphases, and for parts that are not detailed or recorded in a certain embodiment, refer to the relevant descriptions of other embodiments.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者外部设备软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art can appreciate that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of external device software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
在本申请所提供的实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方 式实现。例如,以上所描述的系统实施例仅仅是示意性的,例如,上述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性,机械或其它的形式。In the embodiments provided in this application, it should be understood that the disclosed devices and methods can be implemented in other ways. For example, the system embodiments described above are only illustrative. For example, the division of the above-mentioned modules or units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined Or it can be integrated into another system, or some features can be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
上述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关联的硬件来完成,上述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,上述计算机程序包括计算机程序代码,上述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。上述计算机可读存储介质可以包括:能够携带上述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机可读存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,上述计算机可读存储介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读存储介质不包括是电载波信号和电信信号。If the above integrated units are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the present application realizes all or part of the processes in the methods of the above-mentioned embodiments, and can also be completed by instructing associated hardware through computer programs. The above-mentioned computer programs can be stored in a computer-readable storage medium, and the computer When the program is executed by the processor, the steps in the above-mentioned various method embodiments can be realized. Wherein, the above-mentioned computer program includes computer program code, and the above-mentioned computer program code may be in the form of source code, object code, executable file or some intermediate form. The above-mentioned computer-readable storage medium may include: any entity or device capable of carrying the above-mentioned computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer-readable memory, read-only memory (ROM, Read-Only Memory ), Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunication signal, and software distribution medium, etc. It should be noted that the content contained in the above-mentioned computer-readable storage media can be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, computer-readable storage media The medium does not include electrical carrier signals and telecommunication signals.
以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above embodiments are only used to illustrate the technical solutions of the present application, rather than to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still apply to the foregoing embodiments Modifications to the technical solutions recorded, or equivalent replacements for some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of each embodiment of the application, and should be included in this application. within the scope of protection.

Claims (20)

  1. 一种三维重建方法,其特征在于,包括:A three-dimensional reconstruction method, characterized in that, comprising:
    对待重建对象的图像进行特征提取,得到特征向量,其中,所述特征向量用于表征所述待重建对象的形状特征信息;performing feature extraction on the image of the object to be reconstructed to obtain a feature vector, wherein the feature vector is used to characterize the shape feature information of the object to be reconstructed;
    根据所述特征向量及针对所述待重建对象的预设模板,生成特征图,所述预设模板用于表征所述待重建对象的三维结构信息;Generate a feature map according to the feature vector and a preset template for the object to be reconstructed, where the preset template is used to characterize the three-dimensional structure information of the object to be reconstructed;
    将所述特征图输入至已训练的图卷积神经网络中,得到所述待重建对象的三维重建结果。The feature map is input into the trained graph convolutional neural network to obtain a three-dimensional reconstruction result of the object to be reconstructed.
  2. 如权利要求1所述的三维重建方法,其特征在于,所述图卷积神经网络包括N个串联的功能模块;The three-dimensional reconstruction method according to claim 1, wherein the graph convolutional neural network comprises N functional modules connected in series;
    其中,第1个所述功能模块的输入为所述图卷积神经网络的输入,第N个所述功能模块的输出为所述图卷积神经网络的输出,所述N为大于2的整数;Wherein, the input of the first functional module is the input of the graph convolutional neural network, the output of the Nth functional module is the output of the graph convolutional neural network, and the N is an integer greater than 2 ;
    其中,所述功能模块包括卷积单元、归一化单元及激活函数单元。Wherein, the functional modules include a convolution unit, a normalization unit and an activation function unit.
  3. 如权利要求2所述的三维重建方法,其特征在于,第1个所述功能模块包括至少三个指定结构,所述至少三个指定结构依次串联,所述指定结构包括依次串联的所述卷积单元、所述归一化单元及所述激活函数单元;The three-dimensional reconstruction method according to claim 2, wherein the first functional module includes at least three specified structures, and the at least three specified structures are connected in sequence, and the specified structures include the volumes connected in sequence product unit, the normalization unit and the activation function unit;
    在所述至少三个指定结构中:第一个指定结构的输入为第1个所述功能模块的输入,最后一个指定结构的输出与第1个所述功能模块的输入之间的残差为第1个所述功能模块的输出。In the at least three specified structures: the input of the first specified structure is the input of the first functional module, and the residual between the output of the last specified structure and the input of the first functional module is The output of the first described function module.
  4. 如权利要求2所述的三维重建方法,其特征在于,第i个所述功能模块包括至少两个指定结构,所述至少两个指定结构依次串联,所述指定结构包括依次串联的所述卷积单元、所述归一化单元及所述激活函数单元,所述i为大于1且小于N的整数;The three-dimensional reconstruction method according to claim 2, wherein the i-th functional module includes at least two specified structures, the at least two specified structures are connected in series, and the specified structures include the volumes connected in series product unit, the normalization unit and the activation function unit, the i is an integer greater than 1 and less than N;
    在所述至少两个指定结构中:第一个指定结构的输入为第i-1个所述功能模块的输出,最后一个指定结构的输出与第i-1个所述功能模块的输出之间的残差为第i个所述功能模块的输出。In the at least two specified structures: the input of the first specified structure is the output of the i-1th functional module, the output of the last specified structure and the output of the i-1th functional module The residual of is the output of the i-th functional module.
  5. 如权利要求2所述的三维重建方法,其特征在于,第N个所述功能模块包括依次串联的所述卷积单元、所述归一化单元、所述激活函数单元和所述卷积单元;The three-dimensional reconstruction method according to claim 2, wherein the Nth functional module includes the convolution unit, the normalization unit, the activation function unit and the convolution unit connected in series ;
    在第N个所述功能模块中:第一个所述卷积单元的输入为第N-1个所述功能模块的输出,第二个所述卷积单元的输出与第N-1个所述功能模块的输出之间的残差为第N个所述功能模块的输出。In the Nth functional module: the input of the first convolutional unit is the output of the N-1th functional module, and the output of the second convolutional unit is the same as the output of the N-1th functional module. The residual between the outputs of the functional modules is the output of the Nth functional module.
  6. 如权利要求2至5任一项所述的三维重建方法,其特征在于,所述卷积单元为切比雪夫卷积单元。The three-dimensional reconstruction method according to any one of claims 2 to 5, wherein the convolution unit is a Chebyshev convolution unit.
  7. 如权利要求1所述的三维重建方法,其特征在于,当所述待重建对象为人体时,所述预设模板为人体网格图;所述根据所述特征向量及针对所述待重建对象的预设模板,生成特征图,包括:The three-dimensional reconstruction method according to claim 1, wherein when the object to be reconstructed is a human body, the preset template is a human body mesh; The preset template for generating feature maps, including:
    基于所述人体网格图构建预设格式的图结构,所述图结构包含所述人体网格图的顶点信息;Constructing a graph structure in a preset format based on the human body mesh graph, the graph structure including vertex information of the human body mesh graph;
    将所述特征向量与所述图结构进行融合拼接,得到所述特征图。The feature vector is fused and spliced with the graph structure to obtain the feature graph.
  8. 如权利要求7所述的三维重建方法,其特征在于,在所述图卷积神经网络的训练过程中,基于网格损失、三维关节损失、表面法向损失及表面边缘损失计算所述图卷积神经网络的总损失;The three-dimensional reconstruction method according to claim 7, wherein, during the training process of the graph convolutional neural network, the graph volume is calculated based on grid loss, three-dimensional joint loss, surface normal loss, and surface edge loss. The total loss of the product neural network;
    其中,所述网格损失用于描述真实人体网格与预测人体网格之间的位置差异;Wherein, the grid loss is used to describe the position difference between the real human body grid and the predicted human body grid;
    所述三维关节损失用于描述真实人体三维关节与预测人体三维关节之间的位置差异;The three-dimensional joint loss is used to describe the position difference between the real three-dimensional joints of the human body and the predicted three-dimensional joints of the human body;
    所述表面法向损失用于描述真实人体网格的三角形面的法向量与预测人体网格的三角形面的法向量之间的角度差异;The surface normal loss is used to describe the angle difference between the normal vector of the triangular face of the real human body mesh and the normal vector of the triangular face of the predicted human body mesh;
    所述表面边缘损失用于描述真实人体网格的三角形面的边长与预测人体网格的三角形面的边长之间的长度差异。The surface edge loss is used to describe the length difference between the edge lengths of the triangular faces of the real human mesh and the edge lengths of the triangular faces of the predicted human mesh.
  9. 如权利要求1所述的三维重建方法,其特征在于,所述对待重建对象的图像进行特征提取,得到特征向量,包括:The three-dimensional reconstruction method according to claim 1, wherein the feature extraction of the image of the object to be reconstructed to obtain the feature vector comprises:
    基于所述待重建对象对所述图像进行分割,得到局部图像;Segmenting the image based on the object to be reconstructed to obtain a partial image;
    将所述局部图像的尺寸调整至预设尺寸;adjusting the size of the partial image to a preset size;
    通过采用卷积神经网络的编码器对尺寸调整后的所述局部图像进行特征提取,得到所述特征向量。The feature vector is obtained by performing feature extraction on the size-adjusted partial image by using an encoder of a convolutional neural network.
  10. 一种三维重建装置,其特征在于,包括:A three-dimensional reconstruction device, characterized in that it comprises:
    提取模块,用于对待重建对象的图像进行特征提取,得到特征向量,其中,上述特征向量用于表征上述待重建对象的形状特征信息;The extraction module is used to perform feature extraction on the image of the object to be reconstructed to obtain a feature vector, wherein the above-mentioned feature vector is used to represent the shape feature information of the above-mentioned object to be reconstructed;
    生成模块,用于根据上述特征向量及针对上述待重建对象的预设模板,生成特征图,上述预设模板用于表征上述待重建对象的三维结构信息;A generation module, configured to generate a feature map according to the above-mentioned feature vector and a preset template for the above-mentioned object to be reconstructed, and the above-mentioned preset template is used to represent the three-dimensional structure information of the above-mentioned object to be reconstructed;
    重建模块,用于将上述特征图输入至已训练的图卷积神经网络中,得到上述待重建对象的三维重建结果。The reconstruction module is configured to input the above-mentioned feature map into the trained graph convolutional neural network to obtain the three-dimensional reconstruction result of the above-mentioned object to be reconstructed.
  11. 一种电子设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如下步骤:An electronic device, comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, wherein the processor implements the following steps when executing the computer program:
    对待重建对象的图像进行特征提取,得到特征向量,其中,所述特征向量用于表征所述待重建对象的形状特征信息;performing feature extraction on the image of the object to be reconstructed to obtain a feature vector, wherein the feature vector is used to characterize the shape feature information of the object to be reconstructed;
    根据所述特征向量及针对所述待重建对象的预设模板,生成特征图,所述预设模板用于表征所述待重建对象的三维结构信息;Generate a feature map according to the feature vector and a preset template for the object to be reconstructed, where the preset template is used to characterize the three-dimensional structure information of the object to be reconstructed;
    将所述特征图输入至已训练的图卷积神经网络中,得到所述待重建对象的三维重建结果。The feature map is input into the trained graph convolutional neural network to obtain a three-dimensional reconstruction result of the object to be reconstructed.
  12. 如权利要求11所述的电子设备,其特征在于,所述图卷积神经网络包括N个串联的功能模块;The electronic device according to claim 11, wherein the graph convolutional neural network comprises N functional modules connected in series;
    其中,第1个所述功能模块的输入为所述图卷积神经网络的输入,第N个所述功能模块的输出为所述图卷积神经网络的输出,所述N为大于2的整数;Wherein, the input of the first functional module is the input of the graph convolutional neural network, the output of the Nth functional module is the output of the graph convolutional neural network, and the N is an integer greater than 2 ;
    其中,所述功能模块包括卷积单元、归一化单元及激活函数单元。Wherein, the functional modules include a convolution unit, a normalization unit and an activation function unit.
  13. 如权利要求12所述的电子设备,其特征在于,第1个所述功能模块包括至少三个指定结构,所述至少三个指定结构依次串联,所述指定结构包括依次串联的所述卷积单元、所述归一化单元及所述激活函数单元;The electronic device according to claim 12, wherein the first functional module includes at least three specified structures, and the at least three specified structures are connected in series, and the specified structures include the convolution unit, the normalization unit and the activation function unit;
    在所述至少三个指定结构中:第一个指定结构的输入为第1个所述功能模块的输入,最后一个指定结构的输出与第1个所述功能模块的输入之间的残差为第1个所述功能模块的输出。In the at least three specified structures: the input of the first specified structure is the input of the first functional module, and the residual between the output of the last specified structure and the input of the first functional module is The output of the first described function module.
  14. 如权利要求12所述的电子设备,其特征在于,所述第i个所述功能模块包括至少两个指定结构,所述至少两个指定结构依次串联,所述指定结构包括依次串联的所述卷积单元、所述归一化单元及所述激活函数单元,所述i为大于1且小于N的整数;The electronic device according to claim 12, wherein the i-th functional module includes at least two specified structures, and the at least two specified structures are connected in series, and the specified structures include the sequentially connected structures. Convolution unit, the normalization unit and the activation function unit, the i is an integer greater than 1 and less than N;
    在所述至少两个指定结构中:第一个指定结构的输入为第i-1个所述功能模块的输出,最后一个指定结构的输出与第i-1个所述功能模块的输出之间的残差为第i个所述功能模块的输出。In the at least two specified structures: the input of the first specified structure is the output of the i-1th functional module, the output of the last specified structure and the output of the i-1th functional module The residual of is the output of the i-th functional module.
  15. 如权利要求12所述的电子设备,其特征在于,第N个所述功能模块包括依次串联的所述卷积单元、所述归一化单元、所述激活函数单元和所述卷积单元;The electronic device according to claim 12, wherein the Nth functional module comprises the convolution unit, the normalization unit, the activation function unit, and the convolution unit connected in series in sequence;
    在第N个所述功能模块中:第一个所述卷积单元的输入为第N-1个所述功能模块的输出,第二个所述卷积单元的输出与第N-1个所述功能模块的输出之间的残差为第N个所述功能模块的输出。In the Nth functional module: the input of the first convolutional unit is the output of the N-1th functional module, and the output of the second convolutional unit is the same as the output of the N-1th functional module. The residual between the outputs of the functional modules is the output of the Nth functional module.
  16. 如权利要求12至15任一项所述的电子设备,其特征在于,所述卷积单元为切比雪夫卷积单元。The electronic device according to any one of claims 12 to 15, wherein the convolution unit is a Chebyshev convolution unit.
  17. 如权利要求11所述的电子设备,其特征在于,当所述待重建对象为人体时, 所述预设模板为人体网格图;所述根据所述特征向量及针对所述待重建对象的预设模板,生成特征图,包括:The electronic device according to claim 11, wherein when the object to be reconstructed is a human body, the preset template is a human body grid map; Preset templates to generate feature maps, including:
    基于所述人体网格图构建预设格式的图结构,所述图结构包含所述人体网格图的顶点信息;Constructing a graph structure in a preset format based on the human body mesh graph, the graph structure including vertex information of the human body mesh graph;
    将所述特征向量与所述图结构进行融合拼接,得到所述特征图。The feature vector is fused and spliced with the graph structure to obtain the feature graph.
  18. 如权利要求17所述的电子设备,其特征在于,在所述图卷积神经网络的训练过程中,基于网格损失、三维关节损失、表面法向损失及表面边缘损失计算所述图卷积神经网络的总损失;The electronic device according to claim 17, wherein during the training process of the graph convolutional neural network, the graph convolution is calculated based on grid loss, three-dimensional joint loss, surface normal loss, and surface edge loss. the total loss of the neural network;
    其中,所述网格损失用于描述真实人体网格与预测人体网格之间的位置差异;Wherein, the grid loss is used to describe the position difference between the real human body grid and the predicted human body grid;
    所述三维关节损失用于描述真实人体三维关节与预测人体三维关节之间的位置差异;The three-dimensional joint loss is used to describe the position difference between the real three-dimensional joints of the human body and the predicted three-dimensional joints of the human body;
    所述表面法向损失用于描述真实人体网格的三角形面的法向量与预测人体网格的三角形面的法向量之间的角度差异;The surface normal loss is used to describe the angle difference between the normal vector of the triangular face of the real human body mesh and the normal vector of the triangular face of the predicted human body mesh;
    所述表面边缘损失用于描述真实人体网格的三角形面的边长与预测人体网格的三角形面的边长之间的长度差异。The surface edge loss is used to describe the length difference between the edge lengths of the triangular faces of the real human mesh and the edge lengths of the triangular faces of the predicted human mesh.
  19. 如权利要求11项所述的电子设备,其特征在于,所述对待重建对象的图像进行特征提取,得到特征向量,包括:The electronic device according to claim 11, wherein the feature extraction is performed on the image of the object to be reconstructed to obtain a feature vector, comprising:
    基于所述待重建对象对所述图像进行分割,得到局部图像;Segmenting the image based on the object to be reconstructed to obtain a partial image;
    将所述局部图像的尺寸调整至预设尺寸;adjusting the size of the partial image to a preset size;
    通过采用卷积神经网络的编码器对尺寸调整后的所述局部图像进行特征提取,得到所述特征向量。The feature vector is obtained by performing feature extraction on the size-adjusted partial image by using an encoder of a convolutional neural network.
  20. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至9任一项所述的三维重建方法。A computer-readable storage medium storing a computer program, wherein the computer program implements the three-dimensional reconstruction method according to any one of claims 1 to 9 when executed by a processor.
PCT/CN2021/113308 2021-08-18 2021-08-18 Three-dimensional reconstruction method and apparatus, electronic device, and readable storage medium WO2023019478A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/113308 WO2023019478A1 (en) 2021-08-18 2021-08-18 Three-dimensional reconstruction method and apparatus, electronic device, and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/113308 WO2023019478A1 (en) 2021-08-18 2021-08-18 Three-dimensional reconstruction method and apparatus, electronic device, and readable storage medium

Publications (1)

Publication Number Publication Date
WO2023019478A1 true WO2023019478A1 (en) 2023-02-23

Family

ID=85239320

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/113308 WO2023019478A1 (en) 2021-08-18 2021-08-18 Three-dimensional reconstruction method and apparatus, electronic device, and readable storage medium

Country Status (1)

Country Link
WO (1) WO2023019478A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110428493A (en) * 2019-07-12 2019-11-08 清华大学 Single image human body three-dimensional method for reconstructing and system based on grid deformation
CN110458957A (en) * 2019-07-31 2019-11-15 浙江工业大学 A kind of three-dimensional image model construction method neural network based and device
US20200126297A1 (en) * 2018-10-17 2020-04-23 Midea Group Co., Ltd. System and method for generating acupuncture points on reconstructed 3d human body model for physical therapy
CN111369681A (en) * 2020-03-02 2020-07-03 腾讯科技(深圳)有限公司 Three-dimensional model reconstruction method, device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200126297A1 (en) * 2018-10-17 2020-04-23 Midea Group Co., Ltd. System and method for generating acupuncture points on reconstructed 3d human body model for physical therapy
CN110428493A (en) * 2019-07-12 2019-11-08 清华大学 Single image human body three-dimensional method for reconstructing and system based on grid deformation
CN110458957A (en) * 2019-07-31 2019-11-15 浙江工业大学 A kind of three-dimensional image model construction method neural network based and device
CN111369681A (en) * 2020-03-02 2020-07-03 腾讯科技(深圳)有限公司 Three-dimensional model reconstruction method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110458957B (en) Image three-dimensional model construction method and device based on neural network
US11715258B2 (en) Method for reconstructing a 3D object based on dynamic graph network
CN111160085A (en) Human body image key point posture estimation method
CN112270249A (en) Target pose estimation method fusing RGB-D visual features
CN110619676A (en) End-to-end three-dimensional face reconstruction method based on neural network
CN113781659A (en) Three-dimensional reconstruction method and device, electronic equipment and readable storage medium
CN110599528A (en) Unsupervised three-dimensional medical image registration method and system based on neural network
US20230169677A1 (en) Pose Estimation Method and Apparatus
CN112686097A (en) Human body image key point posture estimation method
CN110349087B (en) RGB-D image high-quality grid generation method based on adaptive convolution
CN113762147B (en) Facial expression migration method and device, electronic equipment and storage medium
CN112767467B (en) Double-image depth estimation method based on self-supervision deep learning
CN112149563A (en) Method and system for estimating postures of key points of attention mechanism human body image
CN111950477A (en) Single-image three-dimensional face reconstruction method based on video surveillance
CN113077545B (en) Method for reconstructing clothing human body model from image based on graph convolution
CN115984494A (en) Deep learning-based three-dimensional terrain reconstruction method for lunar navigation image
CN112509106A (en) Document picture flattening method, device and equipment
CN113903028A (en) Target detection method and electronic equipment
CN112862949A (en) Object 3D shape reconstruction method based on multiple views
CN111654621B (en) Dual-focus camera continuous digital zooming method based on convolutional neural network model
Kang et al. Competitive learning of facial fitting and synthesis using uv energy
CN113989441B (en) Automatic three-dimensional cartoon model generation method and system based on single face image
CN116012432A (en) Stereoscopic panoramic image generation method and device and computer equipment
Samavati et al. Deep learning-based 3D reconstruction: a survey
EP4292059A1 (en) Multiview neural human prediction using implicit differentiable renderer for facial expression, body pose shape and clothes performance capture

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21953725

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE