CN113724393B

CN113724393B - Three-dimensional reconstruction method, device, equipment and storage medium

Info

Publication number: CN113724393B
Application number: CN202110924536.2A
Authority: CN
Inventors: 陈星宇; 郑文
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-08-12
Filing date: 2021-08-12
Publication date: 2024-03-19
Anticipated expiration: 2041-08-12
Also published as: CN113724393A

Abstract

The present disclosure relates to a three-dimensional reconstruction method, apparatus, device, and storage medium. The method comprises the following steps: acquiring an image to be processed, wherein the image to be processed comprises a target part; inputting the image to be processed into a first network to obtain the image characteristics of the image to be processed and thermodynamic diagrams of M two-dimensional key points of the target part, wherein M is a positive integer; inputting the image characteristics of the image to be processed and the thermodynamic diagrams of the M two-dimensional key points into a second network to obtain the characteristics of N three-dimensional vertexes in a three-dimensional coordinate system; inputting the characteristics of the N three-dimensional vertexes into a third network to obtain coordinates of K three-dimensional vertexes, wherein the coordinates of the K three-dimensional vertexes are used for generating a three-dimensional model of the target part in the three-dimensional coordinate system, and N and K are positive integers. The three-dimensional reconstruction model is light, and the expansion of an application scene of three-dimensional reconstruction is facilitated.

Description

Three-dimensional reconstruction method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of computers, and in particular, to a three-dimensional reconstruction method, apparatus, device, and storage medium.

Background

Three-dimensional Reconstruction (3D Reconstruction) is an important research direction in the field of computer vision, and three-dimensional Reconstruction of a target part to be reconstructed in an image is realized based on a monocular image, so that the method has very important theoretical significance and application value. In the related art, when a target part (for example, a human body part such as a hand) in an image is three-dimensionally reconstructed, a convolutional neural network is used to extract image features and predict shape parameters and posture parameters of a three-dimensional model, so as to calculate coordinates of the target part in a three-dimensional space. Such three-dimensional reconstruction methods generally require a high amount of computation and parameters, which results in limited expansion of the application scene of the three-dimensional reconstruction.

Disclosure of Invention

The disclosure provides a three-dimensional reconstruction method, device, equipment and storage medium, so as to at least solve the problem that the three-dimensional reconstruction mode in the related art generally requires higher calculation amount and parameter amount, and the expansion of an application scene of the three-dimensional reconstruction is limited. The technical scheme of the present disclosure is as follows:

according to a first aspect of embodiments of the present disclosure, there is provided a three-dimensional reconstruction method, including:

acquiring an image to be processed, wherein the image to be processed comprises a target part;

Inputting an image to be processed into a first network to obtain an image characteristic of the image to be processed and a thermodynamic diagram of M two-dimensional key points of a target part, wherein M is a positive integer;

inputting the image characteristics of the image to be processed and the thermodynamic diagrams of M two-dimensional key points into a second network to obtain the characteristics of N three-dimensional vertexes in a three-dimensional coordinate system;

inputting the characteristics of the N three-dimensional vertexes into a third network to obtain coordinates of K three-dimensional vertexes, wherein N and K are positive integers, and the coordinates of the K three-dimensional vertexes are used for generating a three-dimensional model of the target part in a three-dimensional coordinate system.

In one embodiment, the step of obtaining the characteristics of the N three-dimensional vertices in the three-dimensional coordinate system includes:

fusing the thermodynamic diagram of the two-dimensional key points with the image characteristics aiming at the thermodynamic diagram of each two-dimensional key point to obtain the characteristics of the two-dimensional key points;

based on a preset mapping matrix, converting the features of the M two-dimensional key points into features of N three-dimensional vertexes, wherein the N three-dimensional vertexes are all or part of K three-dimensional vertexes required for reconstructing the target part, and the mapping matrix characterizes the mapping relation between the features of the M two-dimensional key points and the features of the N three-dimensional vertexes, wherein N is smaller than or equal to K.

In one embodiment, the step of obtaining coordinates of the K three-dimensional vertices includes:

when N is smaller than K, performing at least one feature mapping operation based on the features of the N three-dimensional vertexes, and after each feature mapping operation, updating the features of the three-dimensional vertexes through one feature fusion operation to obtain final K features of the three-dimensional vertexes;

when n=k, performing a feature fusion operation based on the features of the N three-dimensional vertices to obtain final features of the K three-dimensional vertices;

the feature mapping operation steps comprise: mapping the features of one set of three-dimensional vertices to features of another set of three-dimensional vertices, the number of the other set of three-dimensional vertices being greater than the number of the one set of three-dimensional vertices;

the feature fusion operation steps comprise: determining a preset neighborhood corresponding to each current three-dimensional vertex, carrying out first feature fusion processing on the three-dimensional vertex and the features of the same dimension in the features of each three-dimensional vertex in the preset neighborhood to obtain fusion features of the same dimension, and carrying out second feature fusion processing on the fusion features of each dimension to update the features of the three-dimensional vertex;

and obtaining coordinates of the K three-dimensional vertexes based on the final characteristics of the K three-dimensional vertexes.

In one embodiment, the value of N is less than or equal to a preset value.

In one embodiment, before converting the features of the M two-dimensional keypoints into features of N three-dimensional vertices in the three-dimensional coordinate system based on the preset mapping matrix, the method further includes:

and when N is smaller than K, performing downsampling on the K three-dimensional vertexes for a plurality of times to obtain N three-dimensional vertexes.

In one embodiment, before the step of acquiring the image to be processed, the method further comprises:

acquiring an original image;

detecting a target part of the original image;

and (5) externally expanding a region with the detected target part as a center by a preset multiple to obtain an image to be processed.

According to a second aspect of embodiments of the present disclosure, there is provided a three-dimensional reconstruction apparatus including:

a first acquisition unit configured to perform acquisition of an image to be processed including a target portion;

the first input unit is configured to execute the input of an image to be processed into a first network to obtain the image characteristics of the image to be processed and the thermodynamic diagrams of M two-dimensional key points of the target part, wherein M is a positive integer;

the second input unit is configured to execute thermodynamic diagrams of image features of the image to be processed and M two-dimensional key points, input a second network and obtain features of N three-dimensional vertexes in a three-dimensional coordinate system;

And the third input unit is configured to execute the characteristic of N three-dimensional vertexes, input the characteristic into a third network, obtain coordinates of K three-dimensional vertexes and generate a three-dimensional model of the target part in a three-dimensional coordinate system, wherein N and K are positive integers.

In one embodiment, the second input unit is specifically configured to perform:

In one embodiment, the third input unit is specifically configured to perform:

In one embodiment, the value of N is less than or equal to a preset value.

In one embodiment, the apparatus further comprises:

and the sampling unit is configured to perform downsampling on the K three-dimensional vertexes for a plurality of times when N is smaller than K so as to obtain N three-dimensional vertexes.

In one embodiment, the apparatus further comprises:

A second acquisition unit configured to perform acquisition of an original image; detecting a target part of the original image; and (5) externally expanding a region with the detected target part as a center by a preset multiple to obtain an image to be processed.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic device, including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to execute instructions to implement the three-dimensional reconstruction method as in any one of the first aspects.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the three-dimensional reconstruction method as in any of the first aspects.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the three-dimensional reconstruction method according to any one of the first aspects.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

inputting the acquired image to be processed containing the target part into a first network to obtain the image characteristics of the image to be processed and the thermodynamic diagrams of M two-dimensional key points of the target part, inputting into a second network to obtain the characteristics of N three-dimensional vertexes in a three-dimensional coordinate system, and inputting into a third network to obtain the coordinates of K three-dimensional vertexes so as to generate a three-dimensional model of the target part in the three-dimensional coordinate system based on the coordinates of K three-dimensional vertexes. Therefore, three-dimensional reconstruction is realized through the three-dimensional reconstruction model formed by the first network, the second network and the third network, and the characteristics of N three-dimensional vertexes are obtained through the image characteristics of the image to be processed and the thermodynamic diagrams of M two-dimensional key points, and then the coordinates of K three-dimensional vertexes are obtained.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

FIG. 1 is a schematic diagram of a hand two-dimensional keypoint illustrated in accordance with an exemplary embodiment.

Fig. 2a is a schematic diagram of an image to be processed, as illustrated according to an exemplary embodiment.

Fig. 2b is a schematic diagram of an exemplary three-dimensional model of a hand according to an exemplary embodiment.

Fig. 2c is a schematic diagram of an exemplary three-dimensional model of a hand according to an exemplary embodiment.

Fig. 3 is a schematic diagram of an application scenario illustrated in accordance with an example embodiment.

Fig. 4 is a schematic diagram of an application scenario illustrated in accordance with an example embodiment.

Fig. 5 is a flow chart illustrating a three-dimensional reconstruction method according to an exemplary embodiment.

Fig. 6 is a flow chart illustrating a three-dimensional reconstruction method according to an exemplary embodiment.

Fig. 7 is a schematic diagram of a Ghost module shown in accordance with an exemplary embodiment.

FIG. 8 is a schematic diagram illustrating a feature fusion operation according to an example embodiment.

Fig. 9 is a flow chart illustrating a three-dimensional reconstruction method according to an exemplary embodiment.

Fig. 10 is a flow chart illustrating a three-dimensional reconstruction method according to an exemplary embodiment.

Fig. 11 is a flow chart illustrating a three-dimensional reconstruction method according to an exemplary embodiment.

Fig. 12 is a block diagram of a three-dimensional reconstruction apparatus according to an exemplary embodiment.

Fig. 13 is a block diagram of an electronic device, shown in accordance with an exemplary embodiment.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

Three-dimensional Reconstruction (3D Reconstruction) is an important research direction in the field of computer vision, and three-dimensional Reconstruction of a target part to be reconstructed in an image is realized based on a monocular image, so that the method has very important theoretical significance and application value. In the related art, when a target part (for example, a human body part such as a hand) in an image is three-dimensionally reconstructed, a convolutional neural network is used to extract image features and predict shape parameters and posture parameters of a three-dimensional model, so as to calculate coordinates of the target part in a three-dimensional space. This approach generally requires a high amount of computation and parameters, resulting in limited expansion of the application scene for three-dimensional reconstruction. For example, it is generally difficult to operate in real time at a terminal.

Taking a three-dimensional reconstruction (Hand Mesh Recovery) of a hand as an example in the related art, the objective is to obtain position information of the hand in a three-dimensional space. Specifically, the convolutional neural network is utilized to extract image features, and shape parameters and posture parameters of the hand MANO model are predicted, so that coordinates of a hand mesh shape (mesh) in a three-dimensional space are calculated. Wherein, according to the definition of the MANO model, the hand can be represented by 778 vertices, which form a mesh with the triangular patches. The two-dimensional key point prediction of the hand requires an algorithm to estimate the image coordinates of the skeletal joints of the hand. The definition of the two-dimensional key points of the hand is shown in fig. 1. Exemplary, three-dimensional hand reconstruction is shown in fig. 2a, 2b and 2c, fig. 2a is an image to be processed, and fig. 2b and 2c are three-dimensional hand models at different viewing angles. This way of three-dimensional reconstruction of the hand requires a very high amount of computation and parameters, resulting in limited expansion of the application scene of the three-dimensional reconstruction, e.g. difficult to run in real time at the terminal.

Therefore, the embodiment of the disclosure provides a three-dimensional reconstruction method, which greatly reduces the calculated amount and the parameter amount, realizes the light weight of a three-dimensional reconstruction model, and is beneficial to expanding an application scene according to actual requirements. The three-dimensional reconstruction method provided by the embodiment of the disclosure can be applied to a server. For example, in the application scenario shown in fig. 3, the server 301 may perform a three-dimensional reconstruction method and send the result of the three-dimensional reconstruction to the terminal 302. Therefore, the response speed of the server is improved, and the expansion of more application scenes is facilitated. The three-dimensional reconstruction method provided by the embodiment of the present disclosure may also be applied to a terminal, for example, an application scenario shown in fig. 4, and the terminal 302 may execute the three-dimensional reconstruction method. Therefore, the three-dimensional reconstruction method can be operated in real time at the terminal, the dependence on the server is reduced, and more application scenes can be expanded.

The terminal can be a mobile terminal such as a smart phone, a notebook, a palm computer, a tablet personal computer and the like. The terminal is provided with an application program, and the three-dimensional reconstruction method provided by the embodiment of the disclosure can be realized through the application program. The application may be a short video application, a photo application, or the like.

The server may be a physical server, a cloud server, etc.

The three-dimensional reconstruction method provided by the embodiment of the present disclosure is described in detail below.

Fig. 5 is a flowchart illustrating a three-dimensional reconstruction method for use in a terminal or a server, as shown in fig. 5, according to an exemplary embodiment, including the following steps.

In step S51, an image to be processed including a target portion is acquired.

In step S52, the image to be processed is input into the first network, and a thermodynamic diagram of the image feature of the image to be processed and M two-dimensional key points of the target portion is obtained, where M is a positive integer.

In step S53, the image feature of the image to be processed and the thermodynamic diagrams of the M two-dimensional key points are input into the second network, so as to obtain the features of the N three-dimensional vertices in the three-dimensional coordinate system.

In step S54, the features of the N three-dimensional vertices are input into a third network to obtain coordinates of K three-dimensional vertices, where N and K are positive integers, and the coordinates of the K three-dimensional vertices are used to generate a three-dimensional model of the target site in the three-dimensional coordinate system.

The target part refers to a part to be subjected to three-dimensional reconstruction. The target site may be a human body site, such as a hand, face, etc., or may be a site of another object, such as an animal body site. The three-dimensional model of the target site may be a mesh-based three-dimensional model, and then the three-dimensional vertices are vertices in the mesh-based three-dimensional model.

Wherein the thermodynamic diagram of the two-dimensional keypoints characterizes the positions of the two-dimensional keypoints in the image to be processed.

It can be understood that the three-dimensional reconstruction method of the present embodiment is implemented by the three-dimensional reconstruction model formed by the first network, the second network and the third network. Wherein the first network, the second network and the second network are all learnable networks.

In this embodiment, after an acquired image to be processed including a target portion is input into a first network to obtain an image feature of the image to be processed and a thermodynamic diagram of M two-dimensional key points of the target portion, a second network is input to obtain features of N three-dimensional vertices in a three-dimensional coordinate system, and a third network is input to obtain coordinates of K three-dimensional vertices, so as to generate a three-dimensional model of the target portion in the three-dimensional coordinate system based on the coordinates of K three-dimensional vertices. Therefore, three-dimensional reconstruction is realized through the three-dimensional reconstruction model formed by the first network, the second network and the third network, and the characteristics of N three-dimensional vertexes are obtained through the image characteristics of the image to be processed and the thermodynamic diagrams of M two-dimensional key points, and then the coordinates of K three-dimensional vertexes are obtained.

In addition, after the coordinates of K three-dimensional vertices are obtained, if the execution body of the three-dimensional reconstruction method is a terminal, a three-dimensional model of the target site may be generated in the three-dimensional coordinate system based on the coordinates of K three-dimensional vertices by the terminal, if the execution body of the three-dimensional reconstruction method is a server, a three-dimensional model of the target site may be generated in the three-dimensional coordinate system based on the coordinates of K three-dimensional vertices by the server, or a three-dimensional model of the target site may be generated in the three-dimensional coordinate system based on the coordinates of K three-dimensional vertices by the terminal.

In practical application, image samples containing the target part can be collected in advance, the image samples are marked, and the marked image samples are input into a three-dimensional reconstruction model for training to obtain a first network, a second network and a third network. The specific training method can refer to the related technology, and will not be described herein.

In an exemplary embodiment, as shown in fig. 6, the specific implementation manner of the feature step of obtaining N three-dimensional vertices in the three-dimensional coordinate system may include:

in step S61, the thermodynamic diagram of the two-dimensional keypoint is fused with the image features for each of the thermodynamic diagrams of the two-dimensional keypoints to obtain features of the two-dimensional keypoints.

Wherein the thermodynamic diagram of the two-dimensional keypoints is the same size as the image to be processed.

In the step, the thermodynamic diagram of the two-dimensional key points is extracted and fused with the image features to obtain the features of the two-dimensional key points, so that the spatial features in the features of the two-dimensional key points are more obvious.

In step S62, the features of the M two-dimensional key points are converted into features of N three-dimensional vertices, where the N three-dimensional vertices are all or part of the K three-dimensional vertices required for reconstructing the target portion, based on a preset mapping matrix, and the mapping matrix characterizes a mapping relationship between the features of the M two-dimensional key points and the features of the N three-dimensional vertices, where N is less than or equal to K.

The preset mapping matrix is also a learnable mapping matrix. The mapping matrix is also referred to as a linear mapping matrix, which refers to a quantitative representation of the linear mapping. Linear mappingIs a mapping from one vector space to another. Here, by a preset mapping matrix a _MN The mapping relationship between the features of the M two-dimensional key points in the image to be processed and the features of the N three-dimensional vertices in the three-dimensional coordinate system can be characterized, and the mapping relationship comprises m×n parameters. The specific values of M, N and K can be set according to the actual situation. Taking the target portion as the hand portion as an example, generally, m=21 two-dimensional key points can be set on the hand portion, k=778 three-dimensional vertices can be set on the hand portion three-dimensional reconstruction, the value of N is less than or equal to 788, for example, n=49, based on which the number of parameters obtained by mapping the matrix is 21×49 (i.e. 1029), and not more than 21×778 (i.e. 16338), while the number of parameters of convolutional neural networks in the related art is far higher than this, and some parameters are more than millions.

In this embodiment, after an acquired image to be processed including a target portion is input into a first network to obtain features of M two-dimensional key points of the target portion, the features of the M two-dimensional key points are converted into features of N three-dimensional vertices in a three-dimensional coordinate system based on a preset mapping matrix, and the features of the N three-dimensional vertices are input into a third network, so that coordinates of K three-dimensional vertices are obtained, and a three-dimensional model of the target portion is generated in the three-dimensional coordinate system based on the coordinates of the K three-dimensional vertices. Because the mapping matrix is adopted, the parameter quantity contained in the mapping matrix is very small, compared with the parameter quantity of a convolutional neural network and the like in the related technology, the parameter quantity is greatly reduced, the calculated quantity is correspondingly reduced, in addition, if the value of N is smaller than K, that is, the characteristics of M two-dimensional key points are mapped to partial three-dimensional vertexes, the parameter quantity contained in the mapping matrix is further reduced, the calculated quantity is correspondingly further reduced, thereby realizing the light weight of the three-dimensional reconstruction model as a whole and being beneficial to the expansion of application scenes of three-dimensional reconstruction.

In an exemplary embodiment, the step of fusing the thermodynamic diagram of the two-dimensional keypoints with the image features may specifically include: and multiplying the thermodynamic diagram of the two-dimensional key points with the image characteristics, and then pooling. The pooling may be maximum pooling or sum pooling, etc. Thus, non-keypoint features can be suppressed by multiplying the image features by a thermodynamic diagram of two-dimensional keypoints, and spatial features can be reduced by pooling.

In an exemplary embodiment, the thermodynamic diagram of the M two-dimensional keypoints of the target location may be specifically a thermodynamic diagram step of obtaining the M two-dimensional keypoints based on image features. Specifically, the method comprises the following steps: the image features are up-sampled using convolution and bilinear sampling alternately to obtain a thermodynamic diagram of the two-dimensional keypoints. Alternatively, the image features may be up-sampled by convolving with the nearest neighbor difference to obtain a thermodynamic diagram of the two-dimensional keypoints. The up-sampling mode can be flexibly selected according to actual conditions.

In an exemplary embodiment, the step of obtaining the image feature may specifically include: based on the convolutional neural network, image features are extracted from the image to be processed. Or extracting image features from the image to be processed based on the Ghost module.

The network structure of the Ghost module is shown in fig. 7 and includes a first convolution layer 701, a first packet convolution layer 702, an averaging pooling layer 703, a full connection layer 704, an activation function sigmiod705, a second convolution layer 706, and a second packet convolution layer 707. Based on this, the image to be processed is input into the first convolution layer 701, resulting in a first feature; inputting the first feature into the first packet convolution layer 702 to obtain a second feature; the first feature and the second feature are connected in series (c is shown in fig. 7 to be connected in series) to obtain a third feature, and the third feature is input into an average pooling layer 703, and a fourth feature is obtained after the processing of the average pooling layer 703, the full connection layer 704 and the sigmiod 705; multiplying the fourth feature by the third feature (multiplied by x in fig. 7) to obtain a fifth feature, and inputting the fifth feature into the second convolution layer 706 to obtain a sixth feature; the sixth feature is input to the second packet convolutional layer 707 to obtain a seventh feature, the sixth feature is concatenated with the seventh feature to obtain an eighth feature, and the eighth feature is output together with the image to be processed (denoted by +in fig. 7) to obtain an image feature.

The idea of the Ghost module is derived from Ghost Net, and the linear mapping (namely concatenation) of the basic features of the first feature and the second feature can be utilized to obtain richer features to be used as image features, so that more feature expressions can be obtained with less calculation amount. The Ghost module is used for replacing the traditional convolutional neural network to extract the image characteristics, so that the parameter quantity and the calculated quantity can be greatly reduced.

In an exemplary embodiment, the third network is a graph roll-up network. The graph convolution network can perform convolution operation on the graph, so that the characteristics in the graph can be better extracted. In practical applications, a conventional graph rolling network may be used as the third network. In the embodiment of the disclosure, the inventor improves the graph rolling network for further realizing the light weight of the three-dimensional reconstruction model, and provides a graph rolling network with separable depth, which will be described in detail in the following embodiments.

In step S54, the step of obtaining coordinates of K three-dimensional vertices may specifically include: and when N is smaller than K, performing at least one feature mapping operation based on the features of the N three-dimensional vertexes, and updating the features of the three-dimensional vertexes through one feature fusion operation after each feature mapping operation so as to obtain final K features of the three-dimensional vertexes. And when n=k, performing a feature fusion operation based on the features of the N three-dimensional vertices to obtain final features of the K three-dimensional vertices. And obtaining coordinates of the K three-dimensional vertexes based on the final characteristics of the K three-dimensional vertexes.

The feature mapping operation step specifically may include: the features of one set of three-dimensional vertices are mapped to features of another set of three-dimensional vertices, the number of the other set of three-dimensional vertices being greater than the number of the one set of three-dimensional vertices.

The feature fusion operation steps specifically may include: determining a preset neighborhood corresponding to each current three-dimensional vertex, carrying out first feature fusion processing on the three-dimensional vertex and the features of the same dimension in the features of each three-dimensional vertex in the preset neighborhood to obtain fusion features of the same dimension, and carrying out second feature fusion processing on the fusion features of each dimension to update the features of the three-dimensional vertex.

In practical application, when n=k, the features of all three-dimensional vertices are obtained by directly converting the features of M two-dimensional key points, and the feature mapping operation can be omitted, and the feature fusion operation can be directly performed on the features of N three-dimensional vertices once.

When N is smaller than K, the M two-dimensional key points are converted to partial three-dimensional vertexes, a rough three-dimensional model is corresponding, and further refinement is needed to obtain the characteristics of the K three-dimensional vertexes. In practice, refinement may be achieved based on at least one feature mapping operation. Specifically, the features of the N three-dimensional vertices may be mapped to the features of the K three-dimensional vertices based on a single feature mapping operation. The features of the N three-dimensional vertices may also be mapped to features of the K three-dimensional vertices based on multiple feature mapping operations. And, after each feature mapping operation, the features of the three-dimensional vertices are updated by a feature fusion operation.

For example, let n=49, k=778, it may be assumed that the features of 49 three-dimensional vertices are mapped to the features of 98 three-dimensional vertices based on the first feature mapping operation, the features of 98 three-dimensional vertices are updated by the first feature fusion operation, the features of 98 three-dimensional vertices are mapped to the features of 195 three-dimensional vertices based on the second feature mapping operation, the features of 195 three-dimensional vertices are updated by the first feature fusion operation, the features of 195 three-dimensional vertices are mapped to the features of 396 three-dimensional vertices based on the third feature mapping operation, the features of 396 three-dimensional vertices are updated by the first feature fusion operation, and finally the features of 396 three-dimensional vertices are mapped to the features of 778 three-dimensional vertices based on the fourth feature fusion operation, and the features of 778 three-dimensional vertices are updated by the first feature fusion operation, so as to obtain the final features of 778 three-dimensional vertices.

In this embodiment, when N is smaller than K, features of N three-dimensional vertices may be gradually mapped to features of K three-dimensional vertices through at least one feature mapping operation, so that features before and after mapping are closer to each other, and thus, the three-dimensional effect is better. And the features of the three-dimensional vertexes can be updated through feature fusion operation, when the features are fused, the features of the same dimension (namely, the features in the space direction) are fused, and then the fused features of different dimensions (namely, the features in the depth direction are fused), namely, the features are fused and decomposed into two times of feature fusion in the depth direction and the space direction, so that the feature fusion with separable depth is realized, the generation of the features with higher dimension can be avoided, and the parameter quantity and the calculated quantity are further reduced.

When the feature mapping operation is performed, the features of one group of three-dimensional vertexes can be mapped into the features of another group of three-dimensional vertexes based on a preset mapping relationship, wherein the preset mapping relationship can be preset.

The first feature fusion processing may be to concatenate features in the same dimension to obtain a first concatenated feature, and convolve the first concatenated feature to obtain a fused feature in the same dimension. And performing second feature fusion processing, namely, concatenating the fusion features of all dimensions to obtain second concatenated features, and convolving the second concatenated features to obtain updated three-dimensional key features. In this way, depth-separable graph convolution can be achieved. The convolution may be a spiral convolution (SpiralConv, spiral Convolution). Then the preset neighborhood is the spiral region.

Taking the dimension of the feature of the three-dimensional vertex as an example of three dimensions, as shown in fig. 8, it is assumed that the neighborhood of the three-dimensional vertex 0 includes a three-dimensional vertex 1 and a three-dimensional vertex 2. Three-dimensional vertex 0 is characterized by { a } ₁ ,a ₂ ,a ₃ In fig. 8, three dots of different densities are filled. The three-dimensional vertex 1 is characterized by { b } ₁ ,b ₂ ,b ₃ Three diagonal stripes of different densities are shown filled in fig. 8. The three-dimensional vertex 2 is characterized by { c ₁ ,c ₂ ,c ₃ In fig. 8, three crossing grids of different densities are filled.

Three-dimensional vertex 0 is characterized by { a } ₁ Characteristics of three-dimensional vertex 1 { b } ₁ The feature of the sum three-dimensional vertex 2 is { c ₁ The characteristics of the same dimension are connected in series to obtain { a } for the same dimension ₁ ,b ₁ ,c ₁ For { a } ₁ ,b ₁ ,c ₁ Convolving to obtain fusion features { f) of the same dimension ₁ The filling is shown as the vertical stripes with the largest spacing in fig. 8. Similarly, three-dimensional vertex 0 is characterized by { a ₂ Characteristics of three-dimensional vertex 1 { b } ₂ The feature of the sum three-dimensional vertex 2 is { c ₂ The characteristics of the same dimension are connected in series to obtain { a } for the same dimension ₂ ,b ₂ ,c ₂ For { a } ₂ ,b ₂ ,c ₂ Convolving to obtain fusion features { f) of the same dimension ₂ In fig. 8, the filling is shown as medium-spaced vertical stripes. Three-dimensional vertex 0 is characterized by { a } ₃ Characteristics of three-dimensional vertex 1 { b } ₃ The feature of the sum three-dimensional vertex 2 is { c ₃ The characteristics of the same dimension are connected in series to obtain { a } for the same dimension ₃ ,b ₃ ,c ₃ For { a } ₃ ,b ₃ ,c ₃ Convolving to obtain fusion features { f) of the same dimension ₃ The filling is illustrated in fig. 8 as vertical stripes with minimum spacing. This process is also known as a depth-wise operation.

Then, the fusion features { f ] of each dimension ₁ }、{f ₁ Sum { f } ₃ Concatenation of { f } to obtain ₁ ,f ₂ ,f ₃ For { f } ₁ ,f ₂ ,f ₃ Convolving to obtain the updated three-dimensional vertex 0 feature { a } ₁ ' i.e. the output features in fig. 8. The three-dimensional vertex characteristics are updated accordingly for each three-dimensional vertex, a process also known as point-by-point operation.

In addition, the first feature fusion processing may be performed in other manners, for example, the features in the same dimension may be weighted and summed, and convolved, so as to obtain a fused feature in the same dimension. The second feature fusion process may be performed in other manners, for example, the fused features of each dimension may be weighted and summed and convolved to obtain an updated three-dimensional key feature.

In an exemplary embodiment, the value of N is less than or equal to a preset value. The preset value is a smaller value, because if the value of N is too large, the distance between the preset value and the number M of the two-dimensional key points is far, the characteristic distance is large, and in some scenes, the characteristics of a large number of three-dimensional vertexes are directly converted through the characteristics of a small number of two-dimensional key points, so that the characteristics of the converted N three-dimensional vertexes are possibly distorted, the effect of the three-dimensional model is reduced, and in order to improve the effect of the three-dimensional model, the value of N can be constrained, and the excessive value is avoided.

In an exemplary embodiment, before converting the features of the M two-dimensional key points into the features of the N three-dimensional vertices in the three-dimensional coordinate system based on the preset mapping matrix, the three-dimensional reconstruction method may further include: and when N is smaller than K, performing downsampling on the K three-dimensional vertexes for a plurality of times to obtain N three-dimensional vertexes. Still for example, let n=49, k=778, based on which 778 three-dimensional vertices can be downsampled 4 times 2 times to obtain 49 three-dimensional vertices. The three-dimensional vertexes can be more reasonably reserved through multiple downsampling, so that a rough model formed by the reserved three-dimensional vertexes is closer to a real target position.

In an exemplary embodiment, before the step of acquiring the image to be processed, as shown in fig. 9, the three-dimensional reconstruction method may further include:

in step S91, an original image is acquired.

In practical applications, the original image may be an original still image or a video image extracted from a video.

In step S92, the original image is subjected to detection of the target portion.

In step S93, a preset number of times is extended with the region where the detected target portion is located as the center.

The specific value of the preset multiple may be set according to practical situations, for example, 1.3 times.

In this embodiment, the image to be processed is obtained by expanding the area where the target part is located by a preset multiple, and not only includes the information of the target part, but also includes the information of the periphery of the target part, so that the image to be processed can include more abundant information, and is beneficial to improving the three-dimensional reconstruction effect.

Of course, the region where the target portion is located may also be determined directly as the image to be processed.

In an exemplary embodiment, the first network may include at least one 2D (D) encoder, and a 2D decoder corresponding to each 2D encoder, and may further include a pooling layer. If multiple 2D encoders are included, the result of the previous 2D encoder is fused (e.g., concatenated) with the result of the corresponding 2D decoder as input to the next 2D encoder. Then, in step S52, inputting the image to be processed into the first network, and the step of obtaining the M two-dimensional key points of the target portion may specifically include: extracting image features by a 2D encoder; and obtaining thermodynamic diagrams of M two-dimensional key points based on the image characteristics through a 2D decoder. And fusing the thermodynamic diagram of the two-dimensional key points with the image characteristics through a pooling layer to obtain the characteristics of the two-dimensional key points. The 2D encoder may be a Ghost module as described above. It can be considered that the 2D encoding is realized through the first network. Correspondingly, the second network implements 3D decoding.

A three-dimensional reconstruction method provided by the embodiments of the present disclosure will be described in more detail below by taking three-dimensional reconstruction of a hand as an example.

In this embodiment, a lightweight hand mesh three-dimensional reconstruction technique is provided, which includes less calculation amount and less parameter amount, and a MANO model of the hand is obtained. Specifically, as shown in fig. 10, the three-dimensional reconstruction method based on the monocular image is divided into three stages of 2D coding, 2D-3D mapping and 3D decoding, a Ghost module is designed to realize 2D coding, a pose pooling and pose-vertex mapping method is designed to realize 2D-3D mapping, and a depth separable graph convolution method is designed to realize 3D decoding.

The specific process of 2D encoding is as follows:

in step one, an original image is acquired.

In the second step, the hand is detected on the original image.

In this step, the position of the hand can be determined using a conventional method for hand detection, such as a centrnet (a target detection network), to obtain a detection frame.

In the third step, the detected region where the hand is located is used as a center to expand by a preset multiple to obtain an image to be processed.

In this step, an image block including the hand (i.e., the image to be processed) is obtained by expanding the detection frame 1.3 times.

In the fourth step, the image to be processed is input into a first network, and thermodynamic diagrams of the image features and M two-dimensional key points of the target part are obtained.

In this step, as shown in fig. 10, the first network includes a first 2D encoder 1001, a first 2D decoder 1002, a second 2D encoder 1003, and a second 2D decoder 1004. The first 2D encoder and the second 2D encoder are both Ghost modules. Based on this, the image to be processed is input to the first 2D encoder 1001, the image features extracted by the first 2D encoder 1001 are input to the first 2D decoder 1002, the thermodynamic diagrams of the M two-dimensional key points obtained by the first 2D decoder 1002 are concatenated with the image features extracted by the first 2D encoder 1001 (illustrated by c in fig. 10), then input to the second 2D encoder 1003, the image features extracted by the second 2D encoder 1003 are input to the second 2D decoder 1004, and the thermodynamic diagrams of the M two-dimensional key points are obtained by the second 2D decoder 1004.

According to the definition of a hand MANO model, the hand has M=21 key points, the gesture pooling method multiplies the image characteristics by using the thermodynamic diagrams of the two-dimensional key points to restrain the non-key point characteristics, and further reduces the space characteristics by using the maximum pooling or summing method to obtain the thermodynamic diagrams of the 21 two-dimensional key points.

The specific procedure for 2D-3D mapping is as follows:

in the fifth step, the thermodynamic diagram of the image feature and the M two-dimensional key points of the target part is input into the second network, and the feature of the N three-dimensional vertices in the three-dimensional coordinate system is obtained.

Specifically, as shown in fig. 10 and 11, the second network includes a pooling layer 1005 and a preset mapping matrix 1006.

The image features extracted by the second 2D encoder 1003 and the thermodynamic diagrams of the M two-dimensional key points obtained by the second 2D decoder 1004 are input into the pooling layer 1005, and the pooling layer 1005 multiplies the input image features by the thermodynamic diagrams of the M two-dimensional key points respectively, and then performs maximum (max) pooling or sum (sum) pooling to obtain features of the M two-dimensional key points. Since the features of the M two-dimensional keypoints reflect the pose of the hand, the operation of the pooling layer is also referred to as pose pooling. Then, based on the preset mapping matrix 1006, the features of the M two-dimensional key points are converted into features of N three-dimensional vertices in the three-dimensional coordinate system. Because the features of the M two-dimensional key points can reflect the gesture of the hand, and the three-dimensional vertexes are vertexes of the mesh three-dimensional model, the step is also called gesture-vertex mapping.

In the gesture-vertex mapping stage, the scheme designs a leachable mapping matrix to convert gesture features into vertex features. Since the MANO model has k=778 vertices, its number is far greater than the number of keypoints. Thus, the present solution downsamples the MANO model 4 times twice, obtaining a coarse three-dimensional model containing only n=49 vertices, so that 49 vertex features can be obtained by pose-vertex mapping.

The specific process of 3D decoding is as follows:

in the sixth step, inputting the characteristics of the N three-dimensional vertexes into a third network to obtain coordinates of the K three-dimensional vertexes.

In this embodiment, as shown in fig. 10, the third network is a 3D decoder 1107, and may specifically be a graph rolling network. The scheme designs a depth-separable graph rolling method to realize 3D decoding, namely 778 MANO vertex coordinates are obtained by utilizing 49 coarse mesh vertex features. First, according to SpiralConv, the spiral region of each vertex is defined as the neighborhood of that vertex. Aiming at each vertex and the vertices in the neighborhood, the scheme designs depth-by-depth operation, the features of the same dimension of each vertex are connected in series, and feature fusion is carried out by utilizing one-dimensional convolution to obtain fusion features of the same dimension. Then, point-by-point operation is designed, fusion features of different dimensions in the depth direction are connected in series, and feature fusion is performed again by utilizing one-dimensional convolution. This separable structure effectively reduces the amount of computation and the number of parameters compared to conventional SpiralConv.

The effect of this scheme is as follows:

1. according to the scheme, the Ghost module is used for 2D coding, so that the calculated amount of the part is reduced by 20 times, and the parameter amount is reduced by 4 times.

2. According to the scheme, 2D-3D feature mapping is performed based on gesture pooling and gesture-vertex mapping, 2D gesture information is effectively extracted, gesture features are converted into vertex features by a linear method, and feature mapping from two-dimensional key points to mesh vertices and 2D to 3D is achieved. And, the whole mapping process only contains 21×49=1029 learnable parameters. The calculated amount of the part is reduced by 50 times, and the parameter amount is reduced by more than 100 times.

3. According to the scheme, 3D decoding is carried out by utilizing the map convolution with separable depth, and the calculated amount and the parameter number of the map convolution are effectively reduced by decomposing feature fusion into two feature fusion in the depth direction and the space direction. The calculated amount and the parameter number of the part are reduced by 20 times.

The 3-point design is beneficial to the weight reduction of the model. Finally, the formed three-dimensional hand reconstruction model only comprises 121M multiplication and addition calculated quantity and 5M parameter quantity.

Therefore, the scheme provides a model light-weight method for 2D coding, 2D-3D mapping and 3D decoding, and the CPU can reach 28FPS reasoning speed based on Gao Tongxiao Dragon 855. In this way, the application scene of three-dimensional reconstruction can be expanded, for example, a hand mesh reconstruction scheme for real-time operation of the terminal can be formed.

Fig. 12 is a block diagram of a three-dimensional reconstruction apparatus according to an exemplary embodiment. Referring to fig. 12, the apparatus 1200 includes a first acquisition unit 1201, a first input unit 1202, a second input unit 1203, and a third input unit 1204.

The first obtaining unit 1201 is configured to perform obtaining an image to be processed, where the image to be processed includes a target portion;

the first input unit 1202 is configured to perform inputting an image to be processed into a first network to obtain an image feature of the image to be processed and a thermodynamic diagram of M two-dimensional key points of a target part, where M is a positive integer;

the second input unit 1203 is configured to perform thermodynamic diagrams of image features of the image to be processed and M two-dimensional key points, input into a second network, and obtain features of N three-dimensional vertices in a three-dimensional coordinate system;

the third input unit 1204 is configured to perform inputting features of N three-dimensional vertices into a third network, to obtain coordinates of K three-dimensional vertices, where N and K are positive integers, for generating a three-dimensional model of the target site in the three-dimensional coordinate system.

In one embodiment, the second input unit 1203 is specifically configured to perform:

In one embodiment, the third input unit 1204 is specifically configured to perform:

In one embodiment, the value of N is less than or equal to a preset value.

In one embodiment, the apparatus may further include:

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Fig. 13 is a block diagram of an electronic device, according to an example embodiment. For example, the electronic device 1300 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 13, an electronic device 1300 may include one or more of the following components: a processing component 1302, a memory 1304, a power component 1306, a multimedia component 1308, an audio component 1310, an input/output (I/O) interface 1312, a sensor component 1314, and a communication component 1316.

The processing component 1302 generally controls overall operation of the electronic device 1300, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 1302 may include one or more processors 1320 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 1302 can include one or more modules that facilitate interactions between the processing component 1302 and other components. For example, the processing component 1302 may include a multimedia module to facilitate interaction between the multimedia component 1308 and the processing component 1302.

The memory 1304 is configured to store various types of data to support operations at the electronic device 1300. Examples of such data include instructions for any application or method operating on the electronic device 1300, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 1304 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply assembly 1306 provides power to the various components of the electronic device 1300. The power components 1306 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 1300.

The multimedia component 1308 includes a screen between the electronic device 1300 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 1308 includes a front-facing camera and/or a rear-facing camera. When the electronic device 1300 is in an operational mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 1310 is configured to output and/or input audio signals. For example, the audio component 1310 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 1300 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 1304 or transmitted via the communication component 1316. In some embodiments, the audio component 1310 also includes a speaker for outputting audio signals.

The I/O interface 1312 provides an interface between the processing component 1302 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 1314 includes one or more sensors for providing status assessment of various aspects of the electronic device 1300. For example, the sensor assembly 1314 may detect an on/off state of the electronic device 1300, a relative positioning of the components, such as a display and keypad of the electronic device 1300, the sensor assembly 1314 may also detect a change in position of the electronic device 1300 or a component of the electronic device 1300, the presence or absence of a user's contact with the electronic device 1300, an orientation or acceleration/deceleration of the electronic device 1300, and a change in temperature of the electronic device 1300. The sensor assembly 1314 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor assembly 1314 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1314 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 1316 is configured to facilitate communication between the electronic device 1300 and other devices, either wired or wireless. The electronic device 1300 may access a wireless network based on a communication standard, such as WiFi, an operator network (e.g., 2G, 3G, 4G, or 5G), or a combination thereof. In one exemplary embodiment, the communication component 1316 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 1316 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 1300 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for performing the above-described personnel rights management method.

In an exemplary embodiment, a computer-readable storage medium is also provided, such as memory 1304 including instructions executable by processor 1320 of electronic device 1300 to perform the above-described personnel rights management method. Alternatively, the computer readable storage medium may be a non-transitory computer readable storage medium, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product is also provided, comprising readable program code executable by the processor 1320 of the electronic device 1300 to perform the above-described personnel rights management method. Alternatively, the program code may be stored in a storage medium of the electronic device 1300, which may be a non-transitory computer readable storage medium, such as ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A three-dimensional reconstruction method, comprising:

inputting the image to be processed into a first network to obtain the image characteristics of the image to be processed and thermodynamic diagrams of M two-dimensional key points of the target part, wherein M is a positive integer;

inputting the image characteristics of the image to be processed and the thermodynamic diagrams of the M two-dimensional key points into a second network to obtain the characteristics of N three-dimensional vertexes in a three-dimensional coordinate system;

inputting the characteristics of the N three-dimensional vertexes into a third network to obtain coordinates of K three-dimensional vertexes, wherein the coordinates of the K three-dimensional vertexes are used for generating a three-dimensional model of the target part in the three-dimensional coordinate system, and N and K are positive integers;

the step of obtaining coordinates of the K three-dimensional vertexes comprises the following steps:

when N is smaller than K, performing at least one feature mapping operation based on the features of the N three-dimensional vertexes, and after each feature mapping operation, updating the features of the three-dimensional vertexes through one feature fusion operation to obtain final features of the K three-dimensional vertexes;

when n=k, performing the feature fusion operation once based on the features of the N three-dimensional vertices to obtain final features of K three-dimensional vertices;

The feature mapping operation steps comprise: mapping features of one set of three-dimensional vertices to features of another set of three-dimensional vertices, the number of the other set of three-dimensional vertices being greater than the number of the set of three-dimensional vertices;

the feature fusion operation steps comprise: determining a preset neighborhood corresponding to each current three-dimensional vertex, performing first feature fusion processing on the three-dimensional vertex and the features of the same dimension in the features of each three-dimensional vertex in the preset neighborhood to obtain fusion features of the same dimension, and performing second feature fusion processing on the fusion features of each dimension to update the features of the three-dimensional vertex;

2. The three-dimensional reconstruction method according to claim 1, wherein the step of obtaining the characteristics of the N three-dimensional vertices in the three-dimensional coordinate system includes:

fusing the thermodynamic diagram of each two-dimensional key point with the image characteristics aiming at the thermodynamic diagram of each two-dimensional key point to obtain the characteristics of the two-dimensional key points;

based on a preset mapping matrix, converting the features of the M two-dimensional key points into features of the N three-dimensional vertexes, wherein the N three-dimensional vertexes are all or part of K three-dimensional vertexes required for reconstructing the target part, and the mapping matrix characterizes the mapping relation between the features of the M two-dimensional key points and the features of the N three-dimensional vertexes, wherein N is smaller than or equal to K.

3. The three-dimensional reconstruction method according to claim 1, wherein the value of N is less than or equal to a preset value.

4. The three-dimensional reconstruction method according to claim 2, wherein before converting the features of the M two-dimensional keypoints into features of N three-dimensional vertices in a three-dimensional coordinate system based on the preset mapping matrix, the method further comprises:

and when N is smaller than K, downsampling the K three-dimensional vertexes for a plurality of times to obtain the N three-dimensional vertexes.

5. The three-dimensional reconstruction method according to claim 1, characterized in that before the step of acquiring the image to be processed, the method further comprises:

acquiring an original image;

detecting a target part of the original image;

and (3) externally expanding a preset multiple by taking the detected region of the target part as the center to obtain the image to be processed.

6. A three-dimensional reconstruction apparatus, comprising:

a first acquisition unit configured to perform acquisition of an image to be processed, the image to be processed including a target portion;

the first input unit is configured to input the image to be processed into a first network to obtain an image characteristic of the image to be processed and a thermodynamic diagram of M two-dimensional key points of the target part, wherein M is a positive integer;

The second input unit is configured to perform thermodynamic diagrams of the image characteristics of the image to be processed and the M two-dimensional key points, input a second network and obtain the characteristics of N three-dimensional vertexes in a three-dimensional coordinate system;

a third input unit configured to perform inputting the features of the N three-dimensional vertices into a third network to obtain coordinates of K three-dimensional vertices, where N and K are positive integers, for generating a three-dimensional model of the target portion in the three-dimensional coordinate system;

wherein the third input unit is specifically configured to perform:

7. The three-dimensional reconstruction device according to claim 6, wherein the second input unit is specifically configured to perform:

8. The three-dimensional reconstruction device of claim 6, wherein N has a value less than or equal to a preset value.

9. The three-dimensional reconstruction device of claim 7, further comprising:

and the sampling unit is configured to perform downsampling for the K three-dimensional vertexes for a plurality of times when N is smaller than K so as to obtain the N three-dimensional vertexes.

10. The three-dimensional reconstruction device of claim 6, further comprising:

a second acquisition unit configured to perform acquisition of an original image; detecting a target part of the original image; and (3) externally expanding a preset multiple by taking the detected region of the target part as the center to obtain the image to be processed.

11. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the three-dimensional reconstruction method of any one of claims 1 to 5.

12. A computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the three-dimensional reconstruction method of any one of claims 1 to 5.