CN113393582A - Three-dimensional object reconstruction algorithm based on deep learning - Google Patents

Three-dimensional object reconstruction algorithm based on deep learning Download PDF

Info

Publication number
CN113393582A
CN113393582A CN202110563571.6A CN202110563571A CN113393582A CN 113393582 A CN113393582 A CN 113393582A CN 202110563571 A CN202110563571 A CN 202110563571A CN 113393582 A CN113393582 A CN 113393582A
Authority
CN
China
Prior art keywords
dimensional
voxel
occupation
neural network
convolutional neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110563571.6A
Other languages
Chinese (zh)
Inventor
贾海涛
刘欣月
张诗涵
李玉琳
邹新雷
任利
许文波
罗俊海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202110563571.6A priority Critical patent/CN113393582A/en
Publication of CN113393582A publication Critical patent/CN113393582A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/08Indexing scheme for image data processing or generation, in general involving all processing steps from image acquisition to 3D model generation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Graphics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a three-dimensional object reconstruction algorithm based on deep learning, which comprises the following steps: inputting a plurality of object two-dimensional images obtained from any angle, preprocessing the two-dimensional images, establishing a convolutional neural network, inputting the two-dimensional images into the established convolutional neural network as training data for training, inputting the two-dimensional images to be tested into a trained convolutional neural network model, and outputting a three-dimensional reconstruction result by the convolutional neural network model. In the invention, the convolutional neural network model comprises an encoder, a decoder and a multi-view feature combination module. The encoder inputs a multi-view two-dimensional image and outputs a two-dimensional characteristic vector, and the two-dimensional characteristic vector needs to be converted into three-dimensional information; inputting the three-dimensional information into a decoder to obtain the three-dimensional prediction voxel occupation of the single image; and finally obtaining the final predicted voxel occupation through a multi-view characteristic combination module. In the testing stage, the accuracy is calculated according to 0-1 occupation and ground real occupation obtained by prediction of the hierarchical prediction strategy.

Description

Three-dimensional object reconstruction algorithm based on deep learning
Technical Field
The invention belongs to the field of computer vision and deep learning, and particularly relates to a three-dimensional object reconstruction algorithm based on deep learning.
Background
In recent years, with the advent of public data sets of three-dimensional objects, complete and accurate reconstruction of three-dimensional geometries from images has become a research hotspot in the fields of computer vision, industrial manufacturing and the like. For example, the AR and VR appearing in the 5G era really feel the reconstruction effect of real-time transmission by using a three-dimensional reconstruction technology; in the field of industrial manufacturing, grabbing and obstacle avoidance of mechanical arms, automatic driving automobile path planning and obstacle avoidance and the like are all full utilization of three-dimensional reconstruction technology. In addition, one can get more information from a three-dimensional model than a two-dimensional image. Therefore, three-dimensional object reconstruction becomes increasingly important.
On the other hand, with the development of computer hardware and artificial intelligence technology, the use of deep learning tools to reconstruct three-dimensional models has become a hot trend in research today. Depth learning based three-dimensional object reconstruction can recover the three-dimensional geometry of an object from a single or multiple images without the need for complex camera precision calibration procedures.
At present, most three-dimensional reconstruction algorithms based on deep learning have some problems: when we see an object from one perspective, it is difficult to infer its overall shape structure because of the object's self-occlusion problem. Corresponding to two-dimensional images, a single image contains limited information, and an accurate and complete three-dimensional model cannot be deduced. In order to solve the problem, researchers propose to reconstruct a three-dimensional model by using a plurality of images of the same object under different viewing angles, and comprehensively consider information contained in the plurality of images.
The current multi-view-based three-dimensional reconstruction algorithm uses the memory capacity of the LSTM to input the characteristics of each image as the LSTM and fuse the contained information. Although the reconstruction effect may be improved to some extent as the number of views increases, this procedure still has drawbacks. Due to the temporal sequence of the LSTM structure, the order of the input images will affect the final reconstruction result, which is obviously inconsistent with the original intention of designing the network model.
Therefore, the invention designs a three-dimensional Object Reconstruction algorithm based on deep learning, which is called a 3D Ponet (3D Reconstruction from Object Network) Network. A convolutional neural network based on multi-view three-dimensional reconstruction maps from two-dimensional images to three-dimensional geometry. The 3D FONET can be trained and tested without inputting any additional information such as object class labels, posture information and the like, and the reconstruction result cannot be changed due to the change of the sequence of input images.
Disclosure of Invention
The invention provides a three-dimensional object reconstruction algorithm based on deep learning, which can be used for training and testing without inputting any additional information such as object class labels, attitude information and the like, and the reconstruction result can not be changed due to the change of the sequence of input images. The invention improves the structures of the encoder and the decoder, and improves the reconstruction effect of the three-dimensional object by applying the layered prediction strategy. See description below for details.
The solution of the invention for solving the technical problem is as follows:
a three-dimensional object reconstruction algorithm based on deep learning, the three-dimensional object reconstruction algorithm comprising the steps of:
step 1, inputting a plurality of object two-dimensional images obtained from any angle;
step 2, establishing a convolutional neural network model;
step 3, inputting the two-dimensional image in the step 1 as training data into the convolutional neural network established in the step 2 for training;
and 4, inputting the two-dimensional image to be detected into the convolutional neural network model trained in the step 3, and outputting a three-dimensional reconstruction result by the convolutional neural network model.
The convolutional neural network model in the step 2 comprises an encoder, a decoder and a multi-view feature combination module. The encoder is composed of a ResNet50 network embedded in SE-Block, the convolutional layers are regularized by using BatchNorm, and a ReLU function is used as an activation function. The decoder consists of a three-dimensional deconvolution layer, a BatchNorm layer, a three-dimensional anti-pooling layer and a ReLU. The input of the encoder is a multi-view two-dimensional image, the output of the encoder is a two-dimensional feature vector, and the two-dimensional feature vector needs to be converted into three-dimensional information; the input of the decoder is three-dimensional information obtained by the vector conversion output by the encoder, and the output of the decoder is three-dimensional prediction voxel occupation of a single image; the input of the multi-view characteristic combination module is the three-dimensional prediction voxel occupation of each two-dimensional image, and the output of the multi-view characteristic combination module is the final prediction voxel occupation.
As a further improvement of the above technical solution, the step 3 specifically includes the following steps:
3.1 randomly initializing the convolutional neural network model;
3.2 independently inputting each image into a two-dimensional encoder for feature extraction;
3.3, converting the two-dimensional characteristic vector extracted in the step 3.2 into three-dimensional information;
3.4, inputting the three-dimensional information obtained in the step 3.3 into a three-dimensional decoder for decoding, so as to generate a form of three-dimensional probability voxels and obtain predicted voxel occupation;
3.5, combining the predicted voxel occupation obtained from each image in the step 3.4 together by a multi-view characteristic combination module so as to obtain the final predicted voxel occupation;
3.6 parameters in the model are optimized step by means of a cross entropy loss function.
As a further improvement of the above technical solution, the step 4 specifically includes the following steps:
4.1, inputting the two-dimensional image set to be detected into the convolutional neural network trained in the steps 3.2 to 3.6 to obtain a predicted voxel probability O;
4.2 according to the 0-1 occupation and the ground real occupation obtained by the prediction of the layered prediction strategy, calculating the accuracy.
The invention has the beneficial effects that: the invention completes the training operation of the convolution neural network model through a plurality of two-dimensional images, and then carries out reconstruction operation on the two-dimensional image to be detected by utilizing the convolution neural network model. The encoder is composed of a ResNet50 network embedded in SE-Block, the convolutional layers are regularized by using BatchNorm, and the active function is a ReLU function. The decoder consists of a three-dimensional deconvolution layer, a BatchNorm layer, a three-dimensional anti-pooling layer and a ReLU. The input of the encoder is a multi-view two-dimensional image, the output of the encoder is a two-dimensional feature vector, and the two-dimensional feature vector needs to be converted into three-dimensional information; the input of the decoder is three-dimensional information obtained by the vector conversion output by the encoder, and the output of the decoder is three-dimensional prediction voxel occupation of a single image; the input of the multi-view characteristic combination module is the three-dimensional prediction voxel occupation of each two-dimensional image, and the output of the multi-view characteristic combination module is the final prediction voxel occupation.
The invention has the beneficial effects that: the invention incorporates a multi-view feature combination module in the model to combine features of multiple input images. According to our experience in daily life, when we see an object, we can know the approximate shape of the object by observing its appearance. Because of the shielding of the object, we can only roughly guess the shape of the object according to life experience, but cannot determine the shape of the object. To know the shape of the object, the most straightforward way is to go round the object. Based on the inspiration, after an image is input, through the designed convolutional neural network, the probability of the current predicted voxel occupation is OtWhere t represents the t-th image. In this image, the part of the object that can be directly observed will get a relatively determined probability, while the network will try to predict the probability that each voxel grid is occupied according to a priori knowledge, due to the part that is not seen by the object's own occlusion. The more the number of input views is, the more visible parts of the object are, the more voxel grids which can determine the occupation situation are, and the model reconstruction effect is continuously improved.
The invention has the beneficial effects that: the invention improves the traditional multi-view three-dimensional reconstruction algorithm, adds a layered prediction strategy into the model, judges whether the voxel grid is occupied or not from outside to inside according to the final predicted voxel occupation probability O, and dynamically adjusts the threshold value of the voxel grid according to the occupation condition of the outer layer voxel grid. If the occupied number of the outer layer voxel grids is small, a smaller threshold value is selected; otherwise, the other way round. As the prediction progresses deep into the object, the prediction results gradually become less inclined. The introduction of the hierarchical prediction strategy leads to improved reconstruction results for thinner parts of the object.
Drawings
In order to more clearly illustrate the technical solution in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly described below. It is clear that the described figures are only some embodiments of the invention, not all embodiments, and that a person skilled in the art can also derive other designs and figures from them without inventive effort.
FIG. 1 is a schematic diagram of a convolutional neural network model structure of the present invention;
FIG. 2 is a flow chart of a three-dimensional reconstruction algorithm of the present invention;
FIG. 3 is a block diagram of an encoder network according to the present invention;
FIG. 4 is a model number distribution for the ShapeNet dataset of the present invention;
fig. 5 is a comparison of several algorithm reconstruction results after inputting different numbers of multi-view images.
Detailed Description
The conception, specific structure, and technical effects of the present invention will be described clearly and completely with reference to the accompanying examples and drawings so that the reader can fully understand the objects, features, and effects of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and those skilled in the art can obtain other embodiments without inventive effort based on the embodiments of the present invention, and all embodiments are within the protection scope of the present invention.
Example 1
Referring to fig. 1 and fig. 2, the invention discloses a three-dimensional object reconstruction algorithm based on deep learning, comprising the following steps:
step 1, inputting a plurality of object two-dimensional images obtained from any angle;
step 2, establishing a convolutional neural network model;
step 3, inputting the two-dimensional image in the step 1 as training data into the convolutional neural network established in the step 2 for training;
and 4, inputting the two-dimensional image to be detected into the convolutional neural network model trained in the step 3, and outputting a three-dimensional reconstruction result by the convolutional neural network model.
The convolutional neural network model in the step 2 comprises an encoder, a decoder and a multi-view feature combination module. The encoder excavates the three-dimensional space structure of the encoder by extracting two-dimensional image features, the encoder is composed of a ResNet50 network embedded into SE-Block, the convolutional layers are regularized by using BatchNorm, and a ReLU function is selected as an activation function. The decoder consists of a three-dimensional deconvolution layer, a BatchNorm layer, a three-dimensional anti-pooling layer and a ReLU. The input of the encoder is a multi-view two-dimensional image, the output of the encoder is a two-dimensional feature vector, and the two-dimensional feature vector needs to be converted into three-dimensional information; the input of the decoder is three-dimensional information obtained by the vector conversion output by the encoder, and the output of the decoder is three-dimensional prediction voxel occupation of a single image; the input of the multi-view characteristic combination module is the three-dimensional prediction voxel occupation of each two-dimensional image, and the output of the multi-view characteristic combination module is the final prediction voxel occupation.
Example 2
The scheme in example 1 is described in detail below with reference to specific calculation formulas and examples, and is described in detail below:
the number of the structural bodies of the convolutional neural network model in the step 2 is 3, and the structural bodies are respectively as follows: encoder, decoder, multiview feature combination module. The encoder network designed by the invention is based on a ResNet network, and is added with SE-Block so that the model has a simple attention mechanism. The ReLU activation function was chosen for each convolutional layer and regularized using BatchNorm. The SE-Block module can improve the expression capability of the features at a small cost, and only different weights need to be distributed to different channels. The network structure of the encoder is shown in fig. 3.
In a specific embodiment of the present invention, the encoder network is embedded with a SE-Block module, which first performs global average pooling on the W × H × C feature maps to obtain a feature map capable of sensing global information of an image, where the size of the feature map is 1 × 1 × C, and the operation is called sequeneze. The specification process follows: the Excitation operation has two full connection layers, and the result value is limited between 0 and 1 through a Sigmoid function. The final result value indicates the weight of each channel, and the attention mechanism is realized.
In the embodiment of the invention, the encoder improves the ResNet50 network. The input size of the encoder is 256 × 256 × 3, and the output of the encoder is 4 × 4 × 2048. The encoder of the present invention is modified as follows:
1. and SE-Block is added to the residual module, so that the model has a simple attention mechanism. Network configurations of ResNet50 and SE-ResNet50 are shown in FIG. 3.
2. The input size is modified, the input size of the ResNet50 network is 224 x 224, and the input size is modified to be 256 x 256;
3. removing the final full connection layer of ResNet 50;
4. the encoder output modification size is 4 x 4.
In a specific embodiment of the present invention, the decoder is configured to decode the three-dimensional feature into a three-dimensional volume. Similar to the encoder, the decoder of the present invention also includes five residual blocks, all of which are composed of Conv3, BN layer, 3D unpoiring, ReLU, except that the last block is composed of convolutional layer. The feature size obtained by the encoder is 4 × 4 × 2048, the data needs to be converted into 4 × 4 × 512, the data is input into a decoder, Conv3d3 and unpoiling processing are performed to obtain a resolution of 32 × 32 × 32 × 2, and finally, the prediction probability O of the three-dimensional volume is obtained by performing normalization by using softmaxi
Further, in the embodiment of the present invention, a cross entropy loss function commonly used in the classification task is selected to optimize the training of the network model. Assuming that the prediction result follows a binomial distribution, the calculation formula of the loss function is shown as the following formula (1):
Figure BDA0003080002380000051
wherein, yiExpressed as positive sample, yiAnd 0 represents a negative sample. p is a radical ofiRepresenting the probability that sample i is predicted to be a positive sample.
The network designed by the invention needs the sum of voxel cross entropies, and the formula is shown as the following formula (2):
Figure BDA0003080002380000052
wherein, GTi,j,kValues of the voxel grid representing the coordinate positions corresponding to i, j, k in the ground truth values, GTi,j,kA value of 1 indicates that the voxel grid is occupied, GTi,j,kA value of 0 indicates that the voxel grid is empty. O isi,j,kAnd the prediction probability of the voxel grid corresponding to the coordinate position in the final prediction voxel occupation probability is shown.
Further, in an embodiment of the present invention, the multi-view feature combining module is configured to combine features of a plurality of input images. In the algorithm structure designed by the invention, each input image can obtain a prediction voxel occupation probability OtAnd this probability will be biased towards the visible part of the object. The shooting angle of each input image is different, and the number of the visible object parts formed by fusion is increased along with continuous input of two-dimensional images. With the model fusing more and more information, the occupancy rate of the voxel grid is more and more definite, and the reconstruction performance of the model is continuously improved.
More vividly, the final predicted voxel occupancy probability O can be determined by equation (3).
Figure BDA0003080002380000061
Wherein, Oi,j,kRepresenting the probability of the voxel grid at (i, j, k);
Figure BDA0003080002380000062
representing the probability of the voxel grid from the t-th image (i, j, k);
Figure BDA0003080002380000063
representing the final predicted voxel occupancy probability O, n predicted voxel occupancy probabilities obtained in all n input images
Figure BDA0003080002380000064
The maximum value is taken.
Further, in a specific embodiment of the present invention, the hierarchical prediction strategy determines whether the voxel grid is occupied from outside to inside according to the final predicted voxel occupancy probability O, and dynamically adjusts the threshold of the voxel grid according to the occupancy of the outer layer voxel grid. If the occupied number of the outer layer voxel grids is small, a smaller threshold value is selected; otherwise, the other way round.
In summary, the present invention provides a novel three-dimensional object reconstruction algorithm that can be trained and tested without inputting any additional information such as object class labels, pose information, etc., and the reconstruction result is not changed due to the change of the sequence of input images.
Example 3
The following experiments were performed to verify the feasibility of the protocols of examples 1 and 2, as described in detail below:
1) experimental data set
The invention selects a ShapeNet data set subset, namely a ShapeNet core data set, to train and test. The sharenetre dataset is a subset of the complete sharenet dataset, which contains 55 common object classes, approximately 51300 three-dimensional models. The invention selects 13 categories with more than 1000 models, 43783 three-dimensional models in total. The selected subset categories of the invention are respectively as follows: airplanes (plane), chairs (chair), cars (car), tables (table), sofas (touch), stools (bench), cabinets (closet), displays (monitor), table lamps (lamp), speakers (speaker), rifles (rifle), telephones (telephone), and boats (vessel).
Each model is a 256 × 256 resolution image taken from 12 different angles and saved as a real voxel footprint with a resolution of 32 × 32 × 32. This data set is hereinafter referred to as the ShapeNet data set. FIG. 5 illustrates the number of models for each category in the data set.
2) Evaluation criteria
The present invention uses a voxel-based Union ratio IoU (Intersection-over-Union) to quantitatively evaluate reconstruction performance. IoU is given by the formula (4):
Figure BDA0003080002380000065
wherein Prediction represents final predicted voxel occupation, and GroudTruth represents ground real voxel occupation. IoU is closer to 0 if the model predicts a voxel grid with ground truth value of 0 as 1; IoU is closer to 1 if the model predicts a voxel grid with ground truth of 1 as 0.
3) Comparison method
The method was compared experimentally with two methods:
3D-R2N 2: it is proposed by Choy et al to learn the mapping of two-dimensional images of objects to three-dimensional geometries in a purely end-to-end manner from a large amount of training data using deep convolutional neural networks.
3D-FHNet: the road strength et al propose that the network uses a feature combination method that treats each input image equally and a hierarchical prediction strategy for improving the reconstruction effect of thinner and thinner portions of the object, so that the model performs three-dimensional reconstruction of any number of images.
4) Network parameter configuration
In the training, an optimization method adam (adaptive motion estimation) is used for network training. The parameters of the Adam optimizer are set as follows, with a learning rate of 0.0001, updating the learning rate every 10 epochs, a weight decay of 1e-5, beta _1 of 0.9, beta _2 of 0.999, and a Batch size of 8. In addition, Epsilon is 10e-8, which prevents division by 0 during the calculation.
5) Results of the experiment
FIG. 5 illustrates the effect of different view numbers on the three-dimensional reconstruction results
As can be seen from fig. 5, for the 3d tool algorithm proposed by the present invention, as the number of views increases, the effect of three-dimensional object reconstruction also increases. And the Basemodel of the invention: compared with the 3D-FHNet method, in the aspect of single-view reconstruction, the 3D FONET method provided by the invention has a slightly weaker effect than the 3D-FHNet on three-dimensional reconstruction; when 3 views, 6 views and 12 views are input, the reconstruction performance of the 3D FAONet method provided by the invention is better than that of the 3D-FHNet method, which can be attributed to the encoder and decoder modules designed by the invention. Compared with the 3D-R2N2 algorithm, under the condition of the same number of input views, the three-dimensional reconstruction effect of the 3D FONET algorithm provided by the invention is better than that of the 3D-R2N2 algorithm.
In summary, the experimental process, the experimental data and the experimental results in the embodiments of the present invention verify the feasibility of the schemes in embodiments 1 and 2, and the three-dimensional object reconstruction algorithm provided in the embodiments of the present invention has a good three-dimensional reconstruction capability for a two-dimensional image.

Claims (5)

1. A three-dimensional object reconstruction algorithm based on deep learning is characterized by comprising the following steps:
step 1: inputting a plurality of two-dimensional images of an object obtained from an arbitrary angle;
step 2: establishing a convolutional neural network model;
and step 3: inputting the two-dimensional image in the step 1 as training data into the convolutional neural network established in the step 2 for training;
and 4, step 4: and (3) inputting the two-dimensional image to be detected into the convolutional neural network model trained in the step (3), and outputting a three-dimensional reconstruction result by the convolutional neural network model.
2. The method of claim 1, wherein: the number of the structural bodies of the convolutional neural network model in the step 2 is 3, and the structural bodies are respectively as follows: encoder, decoder, multiview feature combination module. The encoder is composed of a ResNet50 network embedded in SE-Block, the convolutional layers are regularized by using BatchNorm, and a ReLU function is used as an activation function. The decoder consists of a three-dimensional deconvolution layer, a BatchNorm layer, a three-dimensional anti-pooling layer and a ReLU. The input of the encoder is a multi-view two-dimensional image, the output of the encoder is a two-dimensional feature vector, and the two-dimensional feature vector needs to be converted into three-dimensional information; the input of the decoder is three-dimensional information obtained by the vector conversion output by the encoder, and the output of the decoder is three-dimensional prediction voxel occupation of a single image; the input of the multi-view characteristic combination module is the three-dimensional prediction voxel occupation of each two-dimensional image, and the output of the multi-view characteristic combination module is the final prediction voxel occupation.
3. The method according to claim 2, wherein the step 3 comprises the following steps:
3.1 randomly initializing the convolutional neural network model;
3.2 independently inputting each image into a two-dimensional encoder for feature extraction;
3.3 extracting the two-dimensional feature vector F from step 3.2iConversion into three-dimensional information Vi
3.4 three-dimensional information V obtained in step 3.3iInputting the data into a three-dimensional decoder for decoding, thereby generating a form of three-dimensional probability voxels to obtain predicted voxel occupation Oi
3.5 causing the predicted voxel occupancy O obtained for each image of step 3.4iCombining the multi-view characteristic combination modules together to obtain the final predicted voxel occupation O;
3.6 parameters in the model are optimized step by means of a cross entropy loss function.
4. The method of claim 3, wherein: the step 4 specifically comprises the following steps:
4.1, inputting the two-dimensional image set to be detected into the convolutional neural network trained in the steps 3.2 to 3.6 to obtain a predicted voxel probability O;
4.2 according to the 0-1 occupation and the ground real occupation obtained by the prediction of the layered prediction strategy, calculating the accuracy.
5. The method of claim 4, wherein: the layered prediction strategy of the step 4.2 is to judge whether the voxel grid is occupied or not from outside to inside according to the final prediction voxel occupation probability O of the multi-view feature combination module, and dynamically adjust the threshold value of the voxel grid according to the occupation condition of the outer layer voxel grid.
CN202110563571.6A 2021-05-24 2021-05-24 Three-dimensional object reconstruction algorithm based on deep learning Pending CN113393582A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110563571.6A CN113393582A (en) 2021-05-24 2021-05-24 Three-dimensional object reconstruction algorithm based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110563571.6A CN113393582A (en) 2021-05-24 2021-05-24 Three-dimensional object reconstruction algorithm based on deep learning

Publications (1)

Publication Number Publication Date
CN113393582A true CN113393582A (en) 2021-09-14

Family

ID=77619004

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110563571.6A Pending CN113393582A (en) 2021-05-24 2021-05-24 Three-dimensional object reconstruction algorithm based on deep learning

Country Status (1)

Country Link
CN (1) CN113393582A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116773534A (en) * 2023-08-15 2023-09-19 宁德思客琦智能装备有限公司 Detection method and device, electronic equipment and computer readable medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107680158A (en) * 2017-11-01 2018-02-09 长沙学院 A kind of three-dimensional facial reconstruction method based on convolutional neural networks model
US20190303724A1 (en) * 2018-03-30 2019-10-03 Tobii Ab Neural Network Training For Three Dimensional (3D) Gaze Prediction With Calibration Parameters
CN110390638A (en) * 2019-07-22 2019-10-29 北京工商大学 A kind of high-resolution three-dimension voxel model method for reconstructing
CN110543581A (en) * 2019-09-09 2019-12-06 山东省计算中心(国家超级计算济南中心) Multi-view three-dimensional model retrieval method based on non-local graph convolution network
CN111652966A (en) * 2020-05-11 2020-09-11 北京航空航天大学 Three-dimensional reconstruction method and device based on multiple visual angles of unmanned aerial vehicle
CN111968235A (en) * 2020-07-08 2020-11-20 杭州易现先进科技有限公司 Object attitude estimation method, device and system and computer equipment
CN112734911A (en) * 2021-01-07 2021-04-30 北京联合大学 Single image three-dimensional face reconstruction method and system based on convolutional neural network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107680158A (en) * 2017-11-01 2018-02-09 长沙学院 A kind of three-dimensional facial reconstruction method based on convolutional neural networks model
US20190303724A1 (en) * 2018-03-30 2019-10-03 Tobii Ab Neural Network Training For Three Dimensional (3D) Gaze Prediction With Calibration Parameters
CN110390638A (en) * 2019-07-22 2019-10-29 北京工商大学 A kind of high-resolution three-dimension voxel model method for reconstructing
CN110543581A (en) * 2019-09-09 2019-12-06 山东省计算中心(国家超级计算济南中心) Multi-view three-dimensional model retrieval method based on non-local graph convolution network
CN111652966A (en) * 2020-05-11 2020-09-11 北京航空航天大学 Three-dimensional reconstruction method and device based on multiple visual angles of unmanned aerial vehicle
CN111968235A (en) * 2020-07-08 2020-11-20 杭州易现先进科技有限公司 Object attitude estimation method, device and system and computer equipment
CN112734911A (en) * 2021-01-07 2021-04-30 北京联合大学 Single image three-dimensional face reconstruction method and system based on convolutional neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李欣冉: ""基于深度卷积神经网络的三维运动目标重构关键技术研究"", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116773534A (en) * 2023-08-15 2023-09-19 宁德思客琦智能装备有限公司 Detection method and device, electronic equipment and computer readable medium
CN116773534B (en) * 2023-08-15 2024-03-05 宁德思客琦智能装备有限公司 Detection method and device, electronic equipment and computer readable medium

Similar Documents

Publication Publication Date Title
CN110544297B (en) Three-dimensional model reconstruction method for single image
CN110659727B (en) Sketch-based image generation method
CN111259945B (en) Binocular parallax estimation method introducing attention map
CN110427799B (en) Human hand depth image data enhancement method based on generation of countermeasure network
CN110570522B (en) Multi-view three-dimensional reconstruction method
CN110728219A (en) 3D face generation method based on multi-column multi-scale graph convolution neural network
CN112084934B (en) Behavior recognition method based on bone data double-channel depth separable convolution
CN109005398B (en) Stereo image parallax matching method based on convolutional neural network
CN110070595A (en) A kind of single image 3D object reconstruction method based on deep learning
CN110335344A (en) Three-dimensional rebuilding method based on 2D-3D attention mechanism neural network model
CN113112607B (en) Method and device for generating three-dimensional grid model sequence with any frame rate
CN110956655B (en) Dense depth estimation method based on monocular image
CN111861906A (en) Pavement crack image virtual augmentation model establishment and image virtual augmentation method
CN112614070B (en) defogNet-based single image defogging method
CN110852935A (en) Image processing method for human face image changing with age
CN112634438A (en) Single-frame depth image three-dimensional model reconstruction method and device based on countermeasure network
CN115546442A (en) Multi-view stereo matching reconstruction method and system based on perception consistency loss
CN112489198A (en) Three-dimensional reconstruction system and method based on counterstudy
CN112509021A (en) Parallax optimization method based on attention mechanism
CN115861418A (en) Single-view attitude estimation method and system based on multi-mode input and attention mechanism
CN113393582A (en) Three-dimensional object reconstruction algorithm based on deep learning
Wang et al. DepthNet Nano: A highly compact self-normalizing neural network for monocular depth estimation
CN115860113B (en) Training method and related device for self-countermeasure neural network model
CN115527052A (en) Multi-view clustering method based on contrast prediction
CN113808006B (en) Method and device for reconstructing three-dimensional grid model based on two-dimensional image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210914