CN113724393A

CN113724393A - Three-dimensional reconstruction method, device, equipment and storage medium

Info

Publication number: CN113724393A
Application number: CN202110924536.2A
Authority: CN
Inventors: 陈星宇; 郑文
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-08-12
Filing date: 2021-08-12
Publication date: 2021-11-30
Anticipated expiration: 2041-08-12
Also published as: CN113724393B

Abstract

The disclosure relates to a three-dimensional reconstruction method, a three-dimensional reconstruction device, a three-dimensional reconstruction equipment and a storage medium. The method comprises the following steps: acquiring an image to be processed, wherein the image to be processed comprises a target part; inputting the image to be processed into a first network to obtain image features of the image to be processed and thermodynamic diagrams of M two-dimensional key points of the target part, wherein M is a positive integer; inputting the image characteristics of the image to be processed and the thermodynamic diagrams of the M two-dimensional key points into a second network to obtain the characteristics of N three-dimensional vertexes in a three-dimensional coordinate system; and inputting the characteristics of the N three-dimensional vertexes into a third network to obtain coordinates of K three-dimensional vertexes, wherein the coordinates of the K three-dimensional vertexes are used for generating a three-dimensional model of the target part in the three-dimensional coordinate system, and N and K are positive integers. The light weight of the three-dimensional reconstruction model is realized, and the expansion of the application scene of the three-dimensional reconstruction is facilitated.

Description

Three-dimensional reconstruction method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of computers, and in particular, to a three-dimensional reconstruction method, apparatus, device, and storage medium.

Background

Three-dimensional Reconstruction (3D Reconstruction) is an important research direction in the field of computer vision, and the realization of three-dimensional Reconstruction of a target part to be reconstructed in an image based on a monocular image has very important theoretical significance and application value. In the related art, when a target portion (for example, a human body portion such as a hand) in an image is three-dimensionally reconstructed, a convolutional neural network is used to extract image features and predict shape parameters and posture parameters of a three-dimensional model, thereby calculating coordinates of the target portion in a three-dimensional space. Such three-dimensional reconstruction methods usually require a high amount of computation and parameters, resulting in limited extension of the application scenarios of three-dimensional reconstruction.

Disclosure of Invention

The present disclosure provides a three-dimensional reconstruction method, apparatus, device, and storage medium, to at least solve a problem in the related art that a three-dimensional reconstruction method generally requires a relatively high amount of computation and parameters, which results in a limitation on expansion of an application scenario of three-dimensional reconstruction. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided a three-dimensional reconstruction method, including:

acquiring an image to be processed, wherein the image to be processed comprises a target part;

inputting an image to be processed into a first network to obtain image characteristics of the image to be processed and thermodynamic diagrams of M two-dimensional key points of a target part, wherein M is a positive integer;

inputting image characteristics of the image to be processed and thermodynamic diagrams of M two-dimensional key points into a second network to obtain characteristics of N three-dimensional vertexes in a three-dimensional coordinate system;

and inputting the characteristics of the N three-dimensional vertexes into a third network to obtain the coordinates of the K three-dimensional vertexes, wherein the coordinates of the K three-dimensional vertexes are used for generating a three-dimensional model of the target part in a three-dimensional coordinate system, and N and K are positive integers.

In one embodiment, the step of characterizing the N three-dimensional vertices in the three-dimensional coordinate system comprises:

for the thermodynamic diagram of each two-dimensional key point, fusing the thermodynamic diagrams of the two-dimensional key points and the image features to obtain the features of the two-dimensional key points;

the method comprises the steps of converting the characteristics of M two-dimensional key points into the characteristics of N three-dimensional vertexes based on a preset mapping matrix, wherein the N three-dimensional vertexes are all or part of K three-dimensional vertexes required by target portion reconstruction, and the mapping matrix represents the mapping relation between the characteristics of the M two-dimensional key points and the characteristics of the N three-dimensional vertexes, wherein N is smaller than or equal to K.

In one embodiment, the step of obtaining coordinates of the K three-dimensional vertices comprises:

when N is smaller than K, performing at least one feature mapping operation based on the features of the N three-dimensional vertexes, and updating the features of the three-dimensional vertexes through one feature fusion operation after each feature mapping operation to obtain the final features of the K three-dimensional vertexes;

when N is equal to K, performing feature fusion operation once based on the features of the N three-dimensional vertexes to obtain the final features of the K three-dimensional vertexes;

wherein the feature mapping operation step comprises: mapping the features of one set of three-dimensional vertexes into the features of another set of three-dimensional vertexes, wherein the number of the other set of three-dimensional vertexes is larger than that of the one set of three-dimensional vertexes;

the feature fusion operation step comprises: determining a preset neighborhood corresponding to each current three-dimensional vertex characteristic, performing first characteristic fusion processing on the three-dimensional vertex and the same-dimension characteristic in the characteristics of all the three-dimensional vertices in the preset neighborhood to obtain the same-dimension fusion characteristic, and performing second characteristic fusion processing on the fusion characteristic of all the dimensions to update the characteristics of the three-dimensional vertices;

and obtaining the coordinates of the K three-dimensional vertexes based on the characteristics of the final K three-dimensional vertexes.

In one embodiment, the value of N is less than or equal to a predetermined value.

In one embodiment, before converting the features of the M two-dimensional key points into the features of the N three-dimensional vertices in the three-dimensional coordinate system based on the preset mapping matrix, the method further includes:

and when N is smaller than K, performing down-sampling on the K three-dimensional vertexes for multiple times to obtain N three-dimensional vertexes.

In one embodiment, before the step of acquiring the image to be processed, the method further comprises:

acquiring an original image;

detecting a target part of the original image;

and expanding the area where the detected target part is located by a preset multiple to obtain the image to be processed.

According to a second aspect of the embodiments of the present disclosure, there is provided a three-dimensional reconstruction apparatus including:

a first acquisition unit configured to perform acquisition of an image to be processed, the image to be processed including a target portion;

the image processing device comprises a first input unit, a second input unit and a third input unit, wherein the first input unit is configured to input an image to be processed into a first network to obtain image characteristics of the image to be processed and thermodynamic diagrams of M two-dimensional key points of a target part, and M is a positive integer;

the second input unit is configured to execute thermodynamic diagrams of image features and M two-dimensional key points of the image to be processed, input the thermodynamic diagrams into a second network, and obtain features of N three-dimensional vertexes in a three-dimensional coordinate system;

and the third input unit is configured to input the characteristics of the N three-dimensional vertexes into a third network, obtain the coordinates of the K three-dimensional vertexes, and generate a three-dimensional model of the target part in a three-dimensional coordinate system, wherein N and K are positive integers.

In one embodiment, the second input unit is specifically configured to perform:

In an embodiment, the third input unit is specifically configured to perform:

In one embodiment, the apparatus further comprises:

and the sampling unit is configured to perform downsampling on the K three-dimensional vertexes for multiple times when the N is smaller than the K so as to obtain the N three-dimensional vertexes.

In one embodiment, the apparatus further comprises:

a second acquisition unit configured to perform acquisition of an original image; detecting a target part of the original image; and expanding the area where the detected target part is located by a preset multiple to obtain the image to be processed.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus, including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the three-dimensional reconstruction method according to any one of the first aspect.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions of the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the three-dimensional reconstruction method according to any one of the first aspect.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the three-dimensional reconstruction method according to any one of the first aspects.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

inputting the acquired image to be processed containing the target part into a first network to obtain image characteristics of the image to be processed and thermodynamic diagrams of M two-dimensional key points of the target part, then inputting a second network to obtain characteristics of N three-dimensional vertexes in a three-dimensional coordinate system, and inputting a third network to obtain coordinates of K three-dimensional vertexes so as to generate a three-dimensional model of the target part in the three-dimensional coordinate system based on the coordinates of the K three-dimensional vertexes. Therefore, three-dimensional reconstruction is realized through a three-dimensional reconstruction model formed by the first network, the second network and the third network, the image features of the image to be processed and the thermodynamic diagrams of M two-dimensional key points are firstly used for obtaining the features of N three-dimensional vertexes, and then the coordinates of K three-dimensional vertexes are obtained.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a schematic diagram of a hand two-dimensional keypoint, illustrated in accordance with an exemplary embodiment.

FIG. 2a is a schematic diagram of an image to be processed, illustrated according to an exemplary embodiment.

FIG. 2b is a schematic diagram of a three-dimensional model of a hand illustrated in accordance with an exemplary embodiment.

FIG. 2c is a schematic diagram of a three-dimensional model of a hand illustrated in accordance with an exemplary embodiment.

FIG. 3 is a schematic diagram illustrating an application scenario in accordance with an exemplary embodiment.

FIG. 4 is a schematic diagram illustrating an application scenario in accordance with an exemplary embodiment.

Fig. 5 is a flow chart illustrating a method of three-dimensional reconstruction in accordance with an exemplary embodiment.

Fig. 6 is a flow chart illustrating a method of three-dimensional reconstruction in accordance with an exemplary embodiment.

Fig. 7 is a schematic diagram illustrating a Ghost module in accordance with an exemplary embodiment.

FIG. 8 is a schematic diagram illustrating a feature fusion operation in accordance with an exemplary embodiment.

Fig. 9 is a flowchart illustrating a three-dimensional reconstruction method according to an exemplary embodiment.

Fig. 10 is a flow chart illustrating a method of three-dimensional reconstruction in accordance with an exemplary embodiment.

FIG. 11 is a flowchart illustrating a method of three-dimensional reconstruction in accordance with an exemplary embodiment.

Fig. 12 is a block diagram illustrating a three-dimensional reconstruction apparatus according to an exemplary embodiment.

FIG. 13 is a block diagram illustrating an electronic device in accordance with an exemplary embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Three-dimensional Reconstruction (3D Reconstruction) is an important research direction in the field of computer vision, and the realization of three-dimensional Reconstruction of a target part to be reconstructed in an image based on a monocular image has very important theoretical significance and application value. In the related art, when a target portion (for example, a human body portion such as a hand) in an image is three-dimensionally reconstructed, a convolutional neural network is used to extract image features and predict shape parameters and posture parameters of a three-dimensional model, thereby calculating coordinates of the target portion in a three-dimensional space. This approach generally requires a high amount of computation and parameters, resulting in a limitation in the extension of the application scenario for three-dimensional reconstruction. For example, the method is generally operated at a server side and is difficult to operate at a terminal in real time.

Taking a Hand three-dimensional reconstruction (Hand Mesh Recovery) manner as an example in the related art, the objective is to obtain position information of a Hand in a three-dimensional space. Specifically, the image features are extracted by using a convolutional neural network, and the shape parameters and the posture parameters of the hand MANO model are predicted, so that the coordinates of a hand mesh in a three-dimensional space are calculated. Where a hand may be represented by 778 vertices, which form a mesh with a triangular patch, as defined by the MANO model. The two-dimensional key point prediction of the hand requires an algorithm to estimate the image coordinates of the hand bone joints. The definition of two-dimensional key points of the hand is shown in figure 1. Illustratively, the hand three-dimensional reconstruction is shown in fig. 2a, 2b and 2c, fig. 2a is a to-be-processed image, and fig. 2b and 2c are hand three-dimensional models with different view angles. This approach of three-dimensional hand reconstruction requires very high computation and parameter numbers, resulting in limited extension of the application scenario of three-dimensional reconstruction, for example, it is difficult to run in real time at the terminal.

Therefore, the embodiment of the disclosure provides a three-dimensional reconstruction method, the calculated amount and the parameter amount are both greatly reduced, the lightweight of the three-dimensional reconstruction model is realized, and the expansion of the application scene according to the actual requirement is facilitated. The three-dimensional reconstruction method provided by the embodiment of the disclosure can be applied to a server. For example, the application scenario shown in fig. 3, the server 301 may execute a three-dimensional reconstruction method and send the result of the three-dimensional reconstruction to the terminal 302. Therefore, the response speed of the server is improved, and more application scenes can be expanded. The three-dimensional reconstruction method provided by the embodiment of the present disclosure may also be applied to a terminal, for example, the application scenario shown in fig. 4, and the terminal 302 may execute the three-dimensional reconstruction method. Therefore, the three-dimensional reconstruction method can be operated in real time at the terminal, the dependence on the server is reduced, and the expansion of more application scenes is facilitated.

The terminal can be a mobile terminal such as a smart phone, a notebook, a palm computer and a tablet computer. The terminal is provided with an application program, and the three-dimensional reconstruction method provided by the embodiment of the disclosure can be realized through the application program. The application may be a short video application, a camera application, or the like.

The server may be a physical server, a cloud server, etc.

The three-dimensional reconstruction method provided by the embodiments of the present disclosure is explained in detail below.

Fig. 5 is a flowchart illustrating a three-dimensional reconstruction method according to an exemplary embodiment, which is used in a terminal or a server, as shown in fig. 5, and includes the following steps.

In step S51, a to-be-processed image is acquired, the to-be-processed image including the target site.

In step S52, the image to be processed is input to the first network, and the image features of the image to be processed and the thermodynamic diagrams of M two-dimensional key points of the target portion are obtained, where M is a positive integer.

In step S53, the image features of the image to be processed and the thermodynamic diagrams of the M two-dimensional key points are input into the second network, and the features of the N three-dimensional vertices in the three-dimensional coordinate system are obtained.

In step S54, the features of the N three-dimensional vertices are input into the third network, and the coordinates of the K three-dimensional vertices are obtained, and the coordinates of the K three-dimensional vertices are used to generate a three-dimensional model of the target portion in a three-dimensional coordinate system, where N and K are positive integers.

The target part is a part to be subjected to three-dimensional reconstruction. The target part may be a human body part, such as a hand, a human face, etc., or may be a part of another object, such as a body part of an animal. The three-dimensional model of the target site may be a mesh-based three-dimensional model, and then, the three-dimensional vertices are vertices in the mesh-based three-dimensional model.

And the thermodynamic diagram of the two-dimensional key points represents the positions of the two-dimensional key points in the image to be processed.

It can be understood that the three-dimensional reconstruction method of the present embodiment is implemented by the three-dimensional reconstruction model formed by the first network, the second network, and the third network. The first network, the second network and the second network are learnable networks.

In this embodiment, after an acquired to-be-processed image including a target portion is input to a first network to obtain image features of the to-be-processed image and a thermodynamic diagram of M two-dimensional key points of the target portion, a second network is input to obtain features of N three-dimensional vertices in a three-dimensional coordinate system, and a third network is input to obtain coordinates of K three-dimensional vertices, so that a three-dimensional model of the target portion is generated in the three-dimensional coordinate system based on the coordinates of the K three-dimensional vertices. Therefore, three-dimensional reconstruction is realized through a three-dimensional reconstruction model formed by the first network, the second network and the third network, the image features of the image to be processed and the thermodynamic diagrams of M two-dimensional key points are firstly used for obtaining the features of N three-dimensional vertexes, and then the coordinates of K three-dimensional vertexes are obtained.

After obtaining the coordinates of the K three-dimensional vertices, if the execution subject of the three-dimensional reconstruction method is the terminal, the terminal may generate the three-dimensional model of the target site in the three-dimensional coordinate system based on the coordinates of the K three-dimensional vertices, if the execution subject of the three-dimensional reconstruction method is the server, the server may generate the three-dimensional model of the target site in the three-dimensional coordinate system based on the coordinates of the K three-dimensional vertices, or the terminal may generate the three-dimensional model of the target site in the three-dimensional coordinate system based on the coordinates of the K three-dimensional vertices.

In practical application, image samples including the target part can be collected in advance, the image samples are labeled, and the labeled image samples are input into a three-dimensional reconstruction model to be trained so as to obtain a first network, a second network and a third network. The specific training mode may refer to related technologies, which are not described herein.

In an exemplary embodiment, as shown in fig. 6, a specific implementation manner of the above step of obtaining the features of the N three-dimensional vertices in the three-dimensional coordinate system may include:

in step S61, for the thermodynamic diagram of each two-dimensional keypoint, the thermodynamic diagrams of the two-dimensional keypoints are fused with the image features to obtain the features of the two-dimensional keypoints.

Wherein the thermodynamic diagram of the two-dimensional key points is the same as the size of the image to be processed.

In the step, the thermodynamic diagrams of the two-dimensional key points are extracted and fused with the image features to obtain the features of the two-dimensional key points, so that the spatial features in the features of the two-dimensional key points are more obvious.

In step S62, the features of the M two-dimensional key points are converted into the features of N three-dimensional vertices, where the N three-dimensional vertices are all or part of K three-dimensional vertices required for reconstructing the target portion, based on a preset mapping matrix, which represents mapping relationships between the features of the M two-dimensional key points and the features of the N three-dimensional vertices, where N is less than or equal to K.

The preset mapping matrix is also a learnable mapping matrix. The mapping matrix is also a linear mapping matrix, and the linear mapping matrix refers to the number representation of linear mapping. A linear mapping is a mapping from one vector space to another. Here, by a preset mapping matrix a_MNThe mapping relation between the features of the M two-dimensional key points in the image to be processed and the features of the N three-dimensional vertexes in the three-dimensional coordinate system can be represented, and the mapping relation comprises M x N parameters. M, N and K can be set according to actual conditions. Taking the target site as an example of a hand, generally, the hand may set M to 21 two-dimensional key points, the hand three-dimensional reconstruction may set K to 778 three-dimensional vertices, and the value of N is less than or equal to 788, for example, N to 49, based on which, the parameter quantity obtained by mapping the matrix is 21 × 49 (i.e., 1029) and does not exceed 21 × 778 (i.e., 16338), while the parameters of the convolutional neural network and the like in the related art are far higher than this, and some are on the order of millions or more.

In this embodiment, after the acquired image to be processed including the target portion is input to the first network to obtain the features of the M two-dimensional key points of the target portion, the features of the M two-dimensional key points are converted into the features of the N three-dimensional vertices in the three-dimensional coordinate system based on the preset mapping matrix, and are input to the third network to obtain the coordinates of the K three-dimensional vertices, so that the three-dimensional model of the target portion is generated in the three-dimensional coordinate system based on the coordinates of the K three-dimensional vertices. Due to the fact that the mapping matrix is adopted, the parameter quantity contained in the mapping matrix is small, compared with the parameter quantity of a convolutional neural network and the like in the related technology, the parameter quantity is greatly reduced, the calculated quantity is correspondingly reduced, in addition, if the value of N is smaller than K, namely, the characteristics of M two-dimensional key points are mapped to partial three-dimensional vertexes, the parameter quantity contained in the mapping matrix is further reduced, the calculated quantity is correspondingly further reduced, therefore, the light weight of a three-dimensional reconstruction model is achieved on the whole, and the expansion of the application scene of three-dimensional reconstruction is facilitated.

In an exemplary embodiment, the step of fusing the thermodynamic diagram of the two-dimensional key points with the image features may specifically include: and multiplying the thermodynamic diagrams of the two-dimensional key points by the image features, and pooling. The pooling may be maximum pooling or sum pooling, etc. In this way, the thermodynamic diagram of the two-dimensional keypoints is multiplied by the image features to suppress non-keypoint features, and spatial features are reduced by pooling.

In an exemplary embodiment, the obtaining of the thermodynamic diagrams of the M two-dimensional key points of the target portion may specifically be a thermodynamic diagram step of obtaining the M two-dimensional key points based on image features. The method specifically comprises the following steps: and (3) alternately using convolution and bilinear sampling to up-sample the image characteristics so as to obtain the thermodynamic diagram of the two-dimensional key points. Alternatively, the image features may be upsampled by convolution with the nearest neighbor difference to obtain a thermodynamic diagram of the two-dimensional keypoints. The up-sampling mode can be flexibly selected according to actual conditions.

In an exemplary embodiment, the step of obtaining the image feature may specifically include: and extracting image features from the image to be processed based on the convolutional neural network. Or, based on a Ghost module, extracting image features from the image to be processed.

The network structure of the Ghost module is shown in fig. 7, and includes a first convolution layer 701, a first packet convolution layer 702, an average pooling layer 703, a fully connected layer 704, an activation function sigmiod705, a second convolution layer 706, and a second packet convolution layer 707. Based on this, the image to be processed is input into the first convolution layer 701 to obtain a first feature; inputting the first characteristic into the first packet convolutional layer 702 to obtain a second characteristic; connecting the first feature and the second feature in series (denoted by c in fig. 7 in series) to obtain a third feature, inputting the third feature into the average pooling layer 703, and obtaining a fourth feature after processing of the average pooling layer 703, the full connection layer 704 and the sigmiod 705; multiplying the fourth feature by the third feature (x in fig. 7) to obtain a fifth feature, and inputting the fifth feature into the second convolution layer 706 to obtain a sixth feature; inputting the sixth feature into the second grouping convolution layer 707 to obtain a seventh feature, concatenating the sixth feature and the seventh feature to obtain an eighth feature, and outputting the eighth feature together with the image to be processed (indicated by + in fig. 7) to obtain the image feature.

The concept of the Ghost module is derived from Ghost Net, and richer features can be obtained by utilizing linear mapping (namely, concatenation) of basic features of the first feature and the second feature to serve as image features, so that more feature expressions can be obtained with less calculation amount. The method has the advantages that the Ghost module is used for replacing the traditional convolutional neural network to extract image features, and the parameter quantity and the calculated quantity can be greatly reduced.

In an exemplary embodiment, the third network is a graph convolution network. The graph convolution network performs convolution operation on the graph, and can better extract the features in the graph. In practical applications, a conventional graph convolution network may be used as the third network. In the embodiments of the present disclosure, in order to further achieve light weight of a three-dimensional reconstructed model, the inventor improves a convolution network, and provides a depth-separable convolution network, which will be described in detail in the following embodiments.

In step S54, the step of obtaining coordinates of K three-dimensional vertices may specifically include: and when N is smaller than K, performing at least one feature mapping operation based on the features of the N three-dimensional vertexes, and updating the features of the three-dimensional vertexes through one feature fusion operation after each feature mapping operation to obtain the final features of the K three-dimensional vertexes. And when the N is equal to K, performing one-time feature fusion operation based on the features of the N three-dimensional vertexes to obtain the final features of the K three-dimensional vertexes. And obtaining the coordinates of the K three-dimensional vertexes based on the characteristics of the final K three-dimensional vertexes.

The feature mapping operation steps may specifically include: and mapping the features of one set of three-dimensional vertexes into the features of another set of three-dimensional vertexes, wherein the number of the other set of three-dimensional vertexes is larger than that of the one set of three-dimensional vertexes.

The feature fusion operation step may specifically include: determining a preset neighborhood corresponding to each current three-dimensional vertex characteristic, performing first characteristic fusion processing on the three-dimensional vertex and the same-dimension characteristic in the characteristics of all the three-dimensional vertices in the preset neighborhood to obtain the same-dimension fusion characteristic, and performing second characteristic fusion processing on the fusion characteristic of all the dimensions to update the characteristics of the three-dimensional vertices.

In practical application, when N is equal to K, the features of all three-dimensional vertices are obtained through conversion by directly using the features of M two-dimensional key points, and feature fusion operation can be directly performed on the features of N three-dimensional vertices without performing feature mapping operation.

When N is smaller than K, the M two-dimensional key points are converted to partial three-dimensional vertexes, a rough three-dimensional model is corresponded, further refinement is needed, and the characteristics of the K three-dimensional vertexes are obtained. In practical applications, refinement may be achieved based on at least one feature mapping operation. Specifically, the features of the N three-dimensional vertices may be mapped to the features of the K three-dimensional vertices based on one feature mapping operation. Features of the N three-dimensional vertices may also be mapped to features of the K three-dimensional vertices based on multiple feature mapping operations. And, after each feature mapping operation, the features of the three-dimensional vertices are updated by one feature fusion operation.

For example, in the case of a hand, if N is 49 and K is 778, the features of 49 three-dimensional vertices may be mapped into the features of 98 three-dimensional vertices based on the first feature mapping operation, the features of 98 three-dimensional vertices may be updated through one feature fusion operation, the features of 98 three-dimensional vertices may be mapped into the features of 195 three-dimensional vertices based on the second feature mapping operation, the features of 195 three-dimensional vertices may be updated through one feature fusion operation, the features of 195 three-dimensional vertices may be mapped into the features of 396 three-dimensional vertices based on the third feature mapping operation, the features of 396 three-dimensional vertices may be updated through one feature fusion operation, the features of 396 three-dimensional vertices may be mapped into the features of 778 three-dimensional vertices based on the fourth feature mapping operation, and the features of 778 three-dimensional vertices may be updated through one feature fusion operation, to obtain the final 778 three-dimensional vertex features.

In this embodiment, when N is smaller than K, the features of the N three-dimensional vertices may be gradually mapped to the features of the K three-dimensional vertices by at least one feature mapping operation, so that the features before and after mapping are closer to each other, and thus, the real situation may be closer to each other, and the three-dimensional effect is better. Moreover, the characteristics of the three-dimensional vertex can be updated through the characteristic fusion operation, when the characteristics are fused, the characteristics of the same dimensionality are fused (namely the characteristics in the space direction are fused), and then the fused characteristics of different dimensionalities are fused (namely the characteristics in the depth direction are fused), namely, the characteristics are fused and decomposed into two times of characteristics fusion in the depth direction and the space direction, so that the characteristic fusion with separable depth is realized, the characteristics of higher dimensionality can be prevented from being generated, and the parameter amount and the calculated amount are further reduced.

When the feature mapping operation is performed, the features of one group of three-dimensional vertices may be mapped to the features of another group of three-dimensional vertices based on a preset mapping relationship, where the preset mapping relationship may be preset.

The first feature fusion processing may be performed by concatenating features of the same dimension to obtain a first concatenated feature, and performing convolution on the first concatenated feature to obtain a fusion feature of the same dimension. And performing second feature fusion processing, namely, concatenating the fusion features of all dimensions to obtain a second concatenated feature, and performing convolution on the second concatenated feature to obtain an updated three-dimensional key feature. In this manner, depth separable graph convolution can be achieved. The Convolution may be Spiral Convolution (Spiral Convolution). Then the predetermined neighborhood is the spiral region.

Taking the dimensions of the features of the three-dimensional vertex as an example, as shown in fig. 8, assume that the neighborhood of three-dimensional vertex 0 includes three-dimensional vertex 1 and three-dimensional vertex 2. Three-dimensional vertex 0 is characterized by { a₁,a₂,a₃In fig. 8, three dot-like fills with different densities are illustrated. The three-dimensional vertex 1 is characterized by b₁,b₂,b₃In fig. 8, it is indicated by three diagonal striped fills of different densities. The three-dimensional vertex 2 is characterized by { c₁,c ₂,c ₃In fig. 8, the three cross-grid packing patterns with different densities are illustrated.

Three-dimensional vertex 0 is characterized by { a₁Feature of three-dimensional vertex 1 b₁And three-dimensional vertex 2 is characterized by c₁Is of the same dimension, will be the sameOne dimension feature concatenation yields { a₁,b₁,c₁Is paired with { a }₁,b₁,c₁Convolution is carried out to obtain a fusion feature { f) of the same dimension₁In fig. 8, the columns are shown filled with the most spaced vertical stripes. Similarly, three-dimensional vertex 0 is characterized by { a₂Feature of three-dimensional vertex 1 b₂And three-dimensional vertex 2 is characterized by c₂The same dimension is adopted, and the features of the same dimension are connected in series to obtain { a }₂,b₂,c₂Is paired with { a }₂,b₂,c₂Convolution is carried out to obtain a fusion feature { f) of the same dimension₂In fig. 8, indicated by a medium-pitch vertical striped fill. Three-dimensional vertex 0 is characterized by { a₃Feature of three-dimensional vertex 1 b₃And three-dimensional vertex 2 is characterized by c₃The same dimension is adopted, and the features of the same dimension are connected in series to obtain { a }₃,b₃,c₃Is paired with { a }₃,b₃,c₃Convolution is carried out to obtain a fusion feature { f) of the same dimension₃In fig. 8, the pattern is shown filled with the smallest-spaced vertical stripes. This process is also known as a depth-wise operation.

Then, the fusion characteristics { f of each dimension are combined₁}、{f₁And { f }and₃Get { f ] through concatenation₁,f₂,f₃Is paired with { f }₁,f₂,f₃Convolving to obtain the updated feature { a } of the three-dimensional vertex 0₁' }, i.e. the output characteristic in fig. 8. The characteristics of the three-dimensional vertexes are updated according to the characteristics of each three-dimensional vertex, and the process is also called point-by-point operation.

In addition, the first feature fusion processing may be performed in another manner, for example, the features of the same dimension may be weighted and summed and convolved to obtain the fusion features of the same dimension. The second feature fusion process may also be performed in other manners, for example, fused features of each dimension may be weighted and summed and convolved to obtain updated three-dimensional key features.

In an exemplary embodiment, the value of N is less than or equal to a preset value. The preset value is a smaller value, because if the value of N is too large, the distance between the preset value and the number M of the two-dimensional key points is far, and the characteristic distance is large, in some scenes, a large number of three-dimensional vertex characteristics are directly converted through the characteristics of a small number of two-dimensional key points, which may cause distortion of the converted N three-dimensional vertex characteristics, and reduce the effect of the three-dimensional model.

In an exemplary embodiment, before converting the features of the M two-dimensional key points into the features of the N three-dimensional vertices in the three-dimensional coordinate system based on the preset mapping matrix, the three-dimensional reconstruction method may further include: and when N is smaller than K, performing down-sampling on the K three-dimensional vertexes for multiple times to obtain N three-dimensional vertexes. Still by way of example, assume that N equals 49 and K equals 778, based on which 778 three-dimensional vertices may be down-sampled 4 times by 2 to obtain 49 three-dimensional vertices. The three-dimensional vertexes can be more reasonably reserved through multiple downsampling, so that a rough model formed by the reserved three-dimensional vertexes is closer to a real target part.

In an exemplary embodiment, before the step of acquiring an image to be processed, as shown in fig. 9, the three-dimensional reconstruction method may further include:

in step S91, an original image is acquired.

In practical applications, the original image may be an original still image or a video image extracted from a video.

In step S92, the target region is detected for the original image.

In step S93, the region where the detected target site is located is expanded by a predetermined multiple.

The specific value of the preset multiple may be set according to actual conditions, for example, set to 1.3 times.

In this embodiment, the image to be processed is obtained by expanding the region where the target portion is located by the preset multiple, and not only includes information of the target portion, but also includes information around the target portion, so that the image to be processed can include richer information, and the three-dimensional reconstruction effect can be improved.

Of course, the region where the target portion is located may also be directly determined as the image to be processed.

In an exemplary embodiment, the first network may include at least one 2-dimensional (D) encoder, and a 2D decoder corresponding to each 2D encoder, and may further include a pooling layer. If multiple 2D numbering devices are included, the results of the previous 2D encoder are fused (e.g., concatenated) with the results of the corresponding 2D decoder as input to the subsequent 2D encoder. Then, in step S52, the step of inputting the image to be processed into the first network to obtain the features of the M two-dimensional key points of the target region may specifically include: extracting image features through a 2D encoder; and obtaining a thermodynamic diagram of the M two-dimensional key points based on the image characteristics through a 2D decoder. And fusing the thermodynamic diagrams of the two-dimensional key points and the image features through the pooling layer to obtain the features of the two-dimensional key points. The 2D encoder may be the above Ghost module. It can be considered that 2D encoding is achieved by the first network. Accordingly, the second network implements 3D decoding.

The following describes a three-dimensional reconstruction method provided by the embodiments of the present disclosure in more detail, taking three-dimensional reconstruction of a hand as an example.

In the embodiment, a light-weight hand mesh three-dimensional reconstruction technology is provided, which includes less calculated amount and parameter amount, and obtains a hand MANO model. Specifically, as shown in fig. 10, the monocular image-based three-dimensional reconstruction method is divided into three stages, namely, 2D coding, 2D-3D mapping and 3D decoding, a Ghost module is designed to implement 2D coding, a pose pooling and pose-vertex mapping method is used to implement 2D-3D mapping, and a depth separable graph convolution method is used to implement 3D decoding.

The specific process of 2D encoding is as follows:

in step one, an original image is acquired.

In step two, the hand is detected for the original image.

In this step, the position of the hand can be determined by using a conventional hand detection method, such as centret (an object detection network), to obtain a detection frame.

And in the third step, the detected area where the hand is located is used as the center to expand the preset multiple, and the image to be processed is obtained.

In this step, an image block (i.e., the image to be processed) including the hand is obtained by expanding the image block by 1.3 times with the detection frame as the center.

In the fourth step, the image to be processed is input into the first network, and the image characteristics and the thermodynamic diagrams of M two-dimensional key points of the target part are obtained.

In this step, as shown in fig. 10, the first network includes a first 2D encoder 1001, a first 2D decoder 1002, a second 2D encoder 1003, and a second 2D decoder 1004. The first 2D encoder and the second 2D encoder are Ghost modules. Based on this, the image to be processed is input into the first 2D encoder 1001, the image features extracted by the first 2D encoder 1001 are input into the first 2D decoder 1002, the thermodynamic diagrams of M two-dimensional key points obtained by the first 2D decoder 1002 are concatenated with the image features extracted by the first 2D encoder 1001 (indicated by c in fig. 10), and then input into the second 2D encoder 1003, the image features extracted by the second 2D encoder 1003 are input into the second 2D decoder 1004, and the thermodynamic diagrams of M two-dimensional key points are obtained by the second 2D decoder 1004.

According to the definition of a hand MANO model, M of the hand is 21 key points, the pose pooling method multiplies the image features by the thermodynamic diagrams of the two-dimensional key points, inhibits the non-key point features, and further reduces the spatial features by a maximum pooling or summing method to obtain the thermodynamic diagrams of the 21 two-dimensional key points.

The specific process of 2D-3D mapping is as follows:

and step five, inputting the image characteristics and the thermodynamic diagrams of the M two-dimensional key points of the target part into a second network to obtain the characteristics of N three-dimensional vertexes in the three-dimensional coordinate system.

Specifically, as shown in fig. 10 and 11, the second network includes a pooling layer 1005 and a preset mapping matrix 1006.

The image features extracted by the second 2D encoder 1003 and the thermodynamic diagrams of the M two-dimensional key points obtained by the second 2D decoder 1004 are input into the pooling layer 1005, and the pooling layer 1005 multiplies the input image features by the thermodynamic diagrams of the M two-dimensional key points, respectively, and then performs maximum (max) pooling or sum (sum) pooling to obtain the features of the M two-dimensional key points. Since the features of the M two-dimensional keypoints reflect the pose of the hand, the operation of the pooling layer is also referred to as pose pooling. Then, based on the preset mapping matrix 1006, the features of the M two-dimensional key points are converted into the features of N three-dimensional vertices in the three-dimensional coordinate system. Because the features of the M two-dimensional key points can reflect the gesture of the hand, and the three-dimensional vertex is the vertex of the mesh three-dimensional model, the step is also called gesture-vertex mapping.

In the attitude-vertex mapping stage, the scheme designs a learnable mapping matrix and converts the attitude characteristics into the vertex characteristics. Since the MANO model has 778 vertices, the number is much greater than the number of keypoints. Therefore, the scheme performs 4 times of downsampling on the MANO model to obtain a rough three-dimensional model only containing 49 vertexes, and thus, the characteristics of the 49 vertexes can be obtained through posture-vertex mapping.

The specific process of 3D decoding is as follows:

and step six, inputting the characteristics of the N three-dimensional vertexes into a third network to obtain the coordinates of the K three-dimensional vertexes.

In this embodiment, as shown in fig. 10, the third network is a 3D decoder 1107, which may be specifically a graph convolution network. The scheme designs a depth separable graph convolution method to realize 3D decoding, namely 778 MANO vertex coordinates are obtained by using 49 rough mesh vertex characteristics. First, according to SpiralConv, the helix region of each vertex is defined as the neighborhood of the vertex. Aiming at each vertex and the vertexes in the neighborhood thereof, the scheme designs depth-by-depth operation, connects the characteristics of the vertexes with the same dimension in series, and performs characteristic fusion by utilizing one-dimensional convolution to obtain the fusion characteristics with the same dimension. Then, point-by-point operation is designed, fusion features of different dimensions in the depth direction are connected in series, and feature fusion is carried out by utilizing one-dimensional convolution again. This separable structure effectively reduces the amount of computation and parameters compared to a conventional SpiralConv.

The scheme has the following effects:

1. according to the scheme, the Ghost module is used for carrying out 2D coding, the calculated amount of the part is reduced by 20 times, and the parameter amount is reduced by 4 times.

2. According to the scheme, 2D-3D feature mapping is carried out based on attitude pooling and attitude-vertex mapping, 2D attitude information is effectively extracted, attitude features are converted into vertex features by a linear method, and feature mapping from two-dimensional key points to mesh vertexes and from 2D to 3D is achieved. And, the whole mapping process only contains 21 × 49 — 1029 learnable parameters. The calculated amount of the part is reduced by 50 times, and the parameter amount is reduced by more than 100 times.

3. According to the scheme, 3D decoding is carried out by using the depth separable graph convolution, and the calculated amount and the parameter amount of the graph convolution are effectively reduced by decomposing feature fusion into twice feature fusion in the depth direction and the space direction. The calculated amount and the parameter amount of the part are reduced by 20 times.

The above 3-point design is beneficial to the lightweight of the model. Finally, the formed hand three-dimensional reconstruction model only comprises 121M multiply-add calculation amount and 5M parameter amount.

Therefore, the model lightweight method is provided for 2D coding, 2D-3D mapping and 3D decoding, and the high-pass Cellcon 855 CPU-based reasoning speed can reach 28 FPS. Thus, the application scene of three-dimensional reconstruction can be expanded, for example, a hand mesh reconstruction scheme running in real time facing a terminal can be formed.

Fig. 12 is a block diagram illustrating a three-dimensional reconstruction apparatus according to an exemplary embodiment. Referring to fig. 12, the apparatus 1200 includes a first acquiring unit 1201, a first input unit 1202, a second input unit 1203, and a third input unit 1204.

The first acquiring unit 1201 is configured to perform acquiring an image to be processed, the image to be processed including a target portion;

the first input unit 1202 is configured to perform inputting of an image to be processed into a first network, and obtain image features of the image to be processed and thermodynamic diagrams of M two-dimensional key points of a target part, where M is a positive integer;

the second input unit 1203 is configured to execute thermodynamic diagrams of image features and M two-dimensional key points of the image to be processed, input a second network, and obtain features of N three-dimensional vertices in a three-dimensional coordinate system;

the third input unit 1204 is configured to perform inputting the features of the N three-dimensional vertices into a third network, obtaining coordinates of K three-dimensional vertices, the coordinates of the K three-dimensional vertices being used for generating a three-dimensional model of the target portion in a three-dimensional coordinate system, where N and K are positive integers.

In an embodiment, the second input unit 1203 is specifically configured to perform:

In an embodiment, the third input unit 1204 is specifically configured to perform:

In one embodiment, the apparatus may further include:

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

FIG. 13 is a block diagram illustrating an electronic device in accordance with an example embodiment. For example, the electronic device 1300 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and so forth.

Referring to fig. 13, electronic device 1300 may include one or more of the following components: a processing component 1302, a memory 1304, a power component 1306, a multimedia component 1308, an audio component 1310, an interface for input/output (I/O) 1312, a sensor component 1314, and a communications component 1316.

The processing component 1302 generally controls overall operation of the electronic device 1300, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 1302 may include one or more processors 1320 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 1302 can include one or more modules that facilitate interaction between the processing component 1302 and other components. For example, the processing component 1302 may include a multimedia module to facilitate interaction between the multimedia component 1308 and the processing component 1302.

The memory 1304 is configured to store various types of data to support operation at the electronic device 1300. Examples of such data include instructions for any application or method operating on the electronic device 1300, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 1304 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 1306 provides power to the various components of the electronic device 1300. Power components 1306 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for electronic device 1300.

The multimedia component 1308 includes a screen between the electronic device 1300 and a user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 1308 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the electronic device 1300 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 1310 is configured to output and/or input audio signals. For example, the audio component 1310 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 1300 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 1304 or transmitted via the communication component 1316. In some embodiments, the audio component 1310 also includes a speaker for outputting audio signals.

The I/O interface 1312 provides an interface between the processing component 1302 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 1314 includes one or more sensors for providing various aspects of state assessment for the electronic device 1300. For example, the sensor assembly 1314 may detect an open/closed state of the electronic device 1300, the relative positioning of components, such as a display and keypad of the electronic device 1300, the sensor assembly 1314 may also detect a change in the position of the electronic device 1300 or a component of the electronic device 1300, the presence or absence of user contact with the electronic device 1300, orientation or acceleration/deceleration of the electronic device 1300, and a change in the temperature of the electronic device 1300. The sensor assembly 1314 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 1314 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1314 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 1316 is configured to facilitate communications between the electronic device 1300 and other devices in a wired or wireless manner. The electronic device 1300 may access a wireless network based on a communication standard, such as WiFi, a carrier network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 1316 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 1316 also includes a Near Field Communications (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 1300 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors, or other electronic components for performing the above-described personnel right management method.

In an exemplary embodiment, a computer-readable storage medium comprising instructions, such as the memory 1304 comprising instructions, executable by the processor 1320 of the electronic device 1300 to perform the above-described method of human rights management is also provided. Alternatively, the computer readable storage medium may be a non-transitory computer readable storage medium, for example, which may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product is also provided, which comprises readable program code executable by the processor 1320 of the electronic device 1300 for performing the above-mentioned method of personnel rights management. Alternatively, the program code may be stored in a storage medium of the electronic device 1300, which may be a non-transitory computer-readable storage medium, for example, ROM, Random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A method of three-dimensional reconstruction, comprising:

inputting the image to be processed into a first network to obtain image features of the image to be processed and thermodynamic diagrams of M two-dimensional key points of the target part, wherein M is a positive integer;

inputting the image characteristics of the image to be processed and the thermodynamic diagrams of the M two-dimensional key points into a second network to obtain the characteristics of N three-dimensional vertexes in a three-dimensional coordinate system;

and inputting the characteristics of the N three-dimensional vertexes into a third network to obtain coordinates of K three-dimensional vertexes, wherein the coordinates of the K three-dimensional vertexes are used for generating a three-dimensional model of the target part in the three-dimensional coordinate system, and N and K are positive integers.

2. The three-dimensional reconstruction method of claim 1, wherein said step of characterizing N three-dimensional vertices in a three-dimensional coordinate system comprises:

for the thermodynamic diagram of each two-dimensional key point, fusing the thermodynamic diagrams of the two-dimensional key points and the image features to obtain features of the two-dimensional key points;

converting the features of the M two-dimensional key points into the features of the N three-dimensional vertexes based on a preset mapping matrix, wherein the N three-dimensional vertexes are all or part of K three-dimensional vertexes required for reconstructing the target part, and the mapping matrix represents the mapping relation between the features of the M two-dimensional key points and the features of the N three-dimensional vertexes, and N is smaller than or equal to K.

3. The three-dimensional reconstruction method according to claim 1 or 2, wherein the step of obtaining the coordinates of the K three-dimensional vertices comprises:

when N is equal to K, performing the feature fusion operation once based on the features of the N three-dimensional vertexes to obtain the final features of the K three-dimensional vertexes;

wherein the feature mapping operation step comprises: mapping features of one set of three-dimensional vertices to features of another set of three-dimensional vertices, the number of the another set of three-dimensional vertices being greater than the number of the set of three-dimensional vertices;

and obtaining the coordinates of the K three-dimensional vertexes based on the final characteristics of the K three-dimensional vertexes.

4. The three-dimensional reconstruction method of claim 1, wherein a value of N is less than or equal to a preset value.

5. The three-dimensional reconstruction method according to claim 2, wherein before said transforming the features of the M two-dimensional key points into the features of N three-dimensional vertices in a three-dimensional coordinate system based on a preset mapping matrix, the method further comprises:

and when N is smaller than K, performing down-sampling on the K three-dimensional vertexes for multiple times to obtain the N three-dimensional vertexes.

6. The three-dimensional reconstruction method of claim 1, wherein prior to said acquiring an image to be processed step, said method further comprises:

acquiring an original image;

detecting a target part of the original image;

and expanding a preset multiple by taking the detected region where the target part is located as a center to obtain the image to be processed.

7. A three-dimensional reconstruction apparatus, comprising:

the first input unit is configured to input the image to be processed into a first network, and obtain image features of the image to be processed and thermodynamic diagrams of M two-dimensional key points of the target part, wherein M is a positive integer;

the second input unit is configured to execute thermodynamic diagrams of the image features of the image to be processed and the M two-dimensional key points, input a second network and obtain the features of N three-dimensional vertexes in a three-dimensional coordinate system;

a third input unit configured to perform inputting the features of the N three-dimensional vertices into a third network, resulting in coordinates of K three-dimensional vertices used for generating a three-dimensional model of the target portion in the three-dimensional coordinate system, wherein N and K are positive integers.

8. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the three-dimensional reconstruction method of any one of claims 1 to 6.

9. A computer-readable storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform the three-dimensional reconstruction method of any one of claims 1 to 6.

10. A computer program product comprising a computer program, characterized in that the computer program is executed by a processor for a three-dimensional reconstruction method according to any one of claims 1 to 6.