CN110246212A

CN110246212A - A kind of target three-dimensional rebuilding method based on self-supervisory study

Info

Publication number: CN110246212A
Application number: CN201910367420.6A
Authority: CN
Inventors: 孙冉; 方志军; 高永彬; 周恒�; 严娟; 黄漫
Original assignee: Shanghai University of Engineering Science
Current assignee: Shanghai University of Engineering Science
Priority date: 2019-05-05
Filing date: 2019-05-05
Publication date: 2019-09-17
Anticipated expiration: 2039-05-05
Also published as: CN110246212B

Abstract

The present invention relates to a kind of target three-dimensional rebuilding methods based on self-supervisory study, comprising: S1, training points cloud autoencoder network；S2, training binary map autoencoder network；S3, input RGB image, obtain true binary map；S4, image pose is extracted using Pose net；S5, training image encoder, and generate preliminary point cloud model；S6, transformation point cloud model is generated；S7, training points cloud encoder, and generate recovery binary map；S8, the mean square deviation for restoring binary map and true binary map is calculated, if mean square deviation is less than preset threshold, exported as a result, no then follow the steps S9；S9, feedback mean square deviation are to image encoder, and return step S5 again.Compared with prior art, the present invention using Pose net extraction image pose and increases two-dimentional supervision message, solves the problems such as fuzzy input picture visual angle, shortage supervision item information, improves the accuracy of target three-dimensional reconstruction.

Description

A kind of target three-dimensional rebuilding method based on self-supervisory study

Technical field

The present invention relates to computer vision and technical field of image processing, more particularly, to a kind of based on self-supervisory study Target three-dimensional rebuilding method.

Background technique

As a research direction of computer vision and computer graphics height intersection, three-dimensional reconstruction passes through specific Device and algorithm rebuild the mathematical model of three-dimension object in the real world, have been widely used for Intelligent unattended Multiple industries such as system, robot, computer aided medicine, virtual reality, augmented reality.Traditional three-dimensional rebuilding method is ground Study carefully and focus mostly on multi-view geometry, including SFM and SLAM, although these methods all achieve certain effect in special scenes Fruit, but there is also some drawbacks: 1) part that multi-view geometry can not lack in reconstructed view needs to input enough views To guarantee the integrality of reconstructed object；2) increase for meaning computation complexity to the reconstruction of multiple view is difficult to accomplish weight in real time The effect built.These drawbacks all limit the application of multiple view reconstruction, and therefore, the method based on study realizes the reconstruction of single-view It is particularly important.

Mainly include two methods currently based on the single-view image reconstruction of study: a kind of method is generated according to 2D CNN The method of image is generated by 3D CNN, and with voxel shape representation；Another method is to pass through change based on Weakly supervised mechanism Constituent encoder fits 3D shape, and 3D-RecGAN is the new network of one kind for being suggested based on GAN structure, can directly from The voxel architecture of individual depth map reply object.The voxel that above two method obtains is 256^3, the requirement to hardware device It is very high, since the 3D shape that voxel indicates is to improve surface accuracy by increasing the fast number of 3 D stereo, in order to EQUILIBRIUM CALCULATION FOR PROCESS complexity and surface accuracy have some method for reconstructing based on grid and point cloud to be suggested again in the recent period, for example, Individual color image directly can be generated triangle gridding by the thought of Pixel2Mesh combination picture scroll product；PSG-Net utilizes unordered The network frame and loss function of point cloud realize the point Yun Chongjian indicated better than voxel.But since single-view reconstruction itself lacks Lack enough supervision item information and part input picture visual angle is fuzzy, causes the model reconstructed there are excalation, lacks carefully Situations such as saving surface abundant.

Summary of the invention

It is an object of the present invention to overcome the above-mentioned drawbacks of the prior art and provide one kind to be based on self-supervisory The target three-dimensional rebuilding method of habit, to improve the accuracy of target three-dimensional reconstruction.

The purpose of the present invention can be achieved through the following technical solutions: a kind of target Three-dimensional Gravity based on self-supervisory study Construction method, comprising the following steps:

S1, training points cloud autoencoder network obtain the point potential feature of cloudWherein, point cloud autoencoder network includes point cloud Self-encoding encoderWith a cloud decoder D_P；

S2, training binary map autoencoder network, obtain the potential feature of binary mapWherein, binary map autoencoder network packet Include binary map self-encoding encoderWith binary map decoder D_I；

S3, input RGB image, obtain true binary map by binary conversion treatment；

S4, feature extraction, the image position of the RGB image inputted are carried out using RGB image of the Pose net to input Appearance；

S5, the RGB image according to input, training image encoderObtain the first space characteristics F_P, pass through the step Point cloud decoder D in S1_P, the RGB image of input is generated into preliminary point cloud model；

S6, translation and rotation transformation are carried out to preliminary point cloud model by image pose, generates transformation point cloud model；

S7, according to transformation point cloud model, training points cloud encoderObtain second space feature F_B, pass through the step Binary map decoder D in S2_I, generate and restore binary map；

S8, the mean square deviation for restoring binary map and true binary map is calculated, if the mean square deviation is less than preset threshold, Output transform point cloud model is the target three-dimensional reconstruction result for inputting RGB image, no to then follow the steps S9；

S9, feedback mean square deviation are to image encoderAnd return step S5 again.

Preferably, the point potential feature of cloud is obtained in the step S1Detailed process are as follows:

S11, by true point cloud data input point cloud self-encoding encoderB × N × 512 are obtained after 5 layer of 1 dimension convolution Feature；

The feature of B × N × 512 obtains the point Yun Qian of B × 512 by maximum pond layer operation in S12, the step S11 In featureWherein, k=512.

Preferably, the midpoint step S1 cloud decoder D_PIncluding three layers of full articulamentum, the potential feature of cloud will be putTurn Change the point cloud format of B × N × 3 into.

Preferably, the potential feature of binary map is obtained in the step S2Detailed process are as follows:

S21, binary conversion treatment is carried out to RGB image, obtains true binary mapWherein, binary conversion treatment is by pixel The place that value is 0 indicates that pixel value is that the place of non-zero is indicated with 1 with 0；

S22, by true binary mapInput binary map self-encoding encoderObtain the potential feature of binary map Its In, k=512.

Preferably, binary map decoder D in the step S2_IIt is operated using deconvolution and carries out picture material filling, so that Picture material is gradually enriched to restore binary map.

Preferably, Pose net uses full articulamentum to return out image aspects information in the step S4, to obtain image Pose, described image Viewing-angle information include (α, beta, gamma, a, b, c) totally six parameters, wherein (α, beta, gamma) is deflection, respectively Indicate that yaw angle, pitch angle and roll angle, (a, b, c) are translation vector；

Described image pose is (R, t), wherein R is spin matrix, t=(a, b, c), by deflection (α, beta, gamma) to rotation The conversion formula of torque battle array R is as follows:

Preferably, the step S5 specifically includes the following steps:

S51, input RGB image are to image encoderObtain the first space characteristics F_P；

S52, by the first space characteristics F_PWith the potential feature of cloudConstitute first-loss function；

S53, according to first-loss function and the first space characteristics F_P, using a cloud decoder D_PGenerate preliminary point cloud model.

Preferably, it is specifically by image pose and preliminary point cloud model phase that transformation point cloud model is generated in the step S6 Multiply, so that preliminary point cloud model be made to transform to camera plane:

x′_i=Rx_i+t i∈[0,N-1]

Wherein, x_iFor the point in preliminary point cloud model, x '_iFor the point in transformation point cloud model, N indicates to wrap in three-dimensional structure The number of the point contained, each point x_iIt is multiplied after transformation with image pose (R, t) and obtains x '_i。

Preferably, the step S7 specifically includes the following steps:

S71, point cloud model input point cloud encoder will be convertedObtain second space feature F_B；

S72, by second space feature F_BWith the potential feature of binary mapConstitute the second loss function；

S73, according to the second loss function and second space feature F_B, using binary map decoder D_IIt generates and restores binary map.

Compared with prior art, the invention has the following advantages that

One, the present invention, which carries out feature extraction to RGB image using Pose net, has network to obtain image pose The ability at resolution image visual angle solves the problems, such as the input picture dimness of vision, can generate reasonable point cloud by operative constraint Model.

Two, the present invention is based on image poses translates preliminary point cloud model, rotation transformation, and is generated by network extensive Multiple binary map, to make full use of binary map information to carry out self-supervisory, improve mesh as the supervision item information for generating point cloud model Mark the accuracy of three-dimensional reconstruction result.

Detailed description of the invention

Fig. 1 is method flow schematic diagram of the invention；

Fig. 2 is the target three-dimensional reconstruction process schematic of embodiment；

Fig. 3 is the schematic network structure of Pose net.

Specific embodiment

The present invention is described in detail with specific embodiment below in conjunction with the accompanying drawings.

As shown in Figure 1, a kind of target three-dimensional rebuilding method based on self-supervisory study, comprising the following steps:

S3, input RGB image, obtain true binary map by binary conversion treatment；

In the present embodiment, cloud self-encoding encoder is putBinary map self-encoding encoderImage encoderPoint cloud encoderPoint cloud decoder D_PWith binary map decoder D_INetwork structure it is as shown in table 1:

Table 1

Wherein, the detailed process of step S1 includes: to be sent into point cloud certainly with the true point cloud data B × X × 3 of aircraft for one group EncoderThe feature of B × N × 512 is obtained after 5 layer of 1 dimension convolution, then obtains B × 512 after being operated by Max pooling The potential feature of point cloudPoint cloud decoder D_PPart is made of three layers of full articulamentum, will finally put cloud Potential featureIt is deformed into the point cloud format of B × N × 3, obtains aircraft point cloud model；

The detailed process of step S2 includes: to carry out binary conversion treatment to the RGB image of one group of aircraft, i.e. pixel value is 0 Place indicates that pixel value is that the place of non-zero is indicated with 1 with 0, and to obtain true binary map, true binary map is then sent into two It is worth figure self-encoding encoderObtain the potential feature of binary mapBinary map decoder D_IIt is grasped using deconvolution Make filling picture material, so that picture material becomes gradually to enrich to recover the binary map of aircraft RGB image.

The present embodiment is to carry out target three-dimensional reconstruction to aircraft brake disc, and process schematic is as shown in Figure 2.Wherein, step S3 It is that binary conversion treatment is carried out to the aircraft RGB image of input, obtains the true binary map of aircraft；

The detailed process of step S4 includes: to obtain the image pose (R, t) of input RGB image, Pose by Pose net The network structure of net indicates image aspects letter as shown in figure 3, its last layer returns out a six-vector using full articulamentum Breath, image aspects information include that (a, beta, gamma, a, b, c) totally six parameters, (α, beta, gamma) respectively indicate three deflections, that is, yaw Angle, pitch angle and roll angle, (a, b, c) indicates translation vector, and in image pose (R, t), R is spin matrix, t=(a, b, c), And the conversion formula of deflection to spin matrix R are as follows:

Image encoder needed for generating preliminary point cloud model in step S5With binary map self-encoding encoderNetwork knot Structure is identical, i.e., in addition to the last layer is full articulamentum, remaining network layer is convolutional layer, thus extracts to the RGB image of input Further feature simultaneously obtains the first space characteristics of aircraft F of 512 dimensions finally by full articulamentum_P, and by the first space characteristics F_PAnd point The potential feature of cloudIt calculates mean square error and constitutes L1 normalization loss function, decoder directlys adopt trained point in step S1 Cloud decoder D_P, to obtain preliminary aircraft point cloud model；

The detailed process of step S6 includes: preliminary to generating in step S5 using the image pose (R, t) in step S4 Point cloud model carries out translation and rotation transformation, i.e., is multiplied with first beans-and bullets shooter cloud, so that preliminary point cloud model is transformed to camera plane, obtain The transformation point cloud model of aircraft makes the 3D aircraft shape of the corresponding pose of a binary map, to realize a cloud and binary map One-to-one correspondence:

x′_i=Rx_i+t i∈[0,N-1]

In formula, N indicates the number for the point for including in three-dimensional structure, each of preliminary point cloud model point x_iBy matrix X ' is obtained after (R, t) transformation_i, x '_iFor the point in transformation point cloud model；

Step S7 is that the transformation point cloud model of aircraft is sent into point cloud encoderIt performs the encoding operation, exports aircraft second Space characteristics F_B, and with the potential feature of binary mapIt constitutes L1 and normalizes loss function, decoder directlys adopt instructs in step S2 The binary map decoder D perfected_I, to obtain the recovery binary map of aircraft；

The detailed process of step S8 and S9 include: the mean square error differential loss of the recovery binary map and true binary map that calculate aircraft Mistake value, if the mean square error penalty values are less than preset threshold, it is winged for exporting the aircraft transformation point cloud model generated in step S6 Otherwise the mean square error penalty values are fed back to image encoder by the target three-dimensional reconstruction result of machineRecirculate step S5 To step S9, until optimal aircraft point cloud model is obtained, using the target three-dimensional reconstruction result as aircraft.

Claims

1. a kind of target three-dimensional rebuilding method based on self-supervisory study, which comprises the following steps:

S1, training points cloud autoencoder network obtain the point potential feature of cloudWherein, point cloud autoencoder network includes that point cloud is self-editing Code deviceWith a cloud decoder D_P；

S2, training binary map autoencoder network, obtain the potential feature of binary mapWherein, binary map autoencoder network includes two It is worth figure self-encoding encoderWith binary map decoder D_I；

S3, input RGB image, obtain true binary map by binary conversion treatment；

S4, feature extraction, the image pose of the RGB image inputted are carried out using RGB image of the Pose net to input；

S5, the RGB image according to input, training image encoderObtain the first space characteristics F_P, by the step S1 Point cloud decoder D_P, the RGB image of input is generated into preliminary point cloud model；

S7, according to transformation point cloud model, training points cloud encoderObtain second space feature F_B, by the step S2 Binary map decoder D_I, generate and restore binary map；

S8, the mean square deviation for restoring binary map and true binary map is calculated, if the mean square deviation is less than preset threshold, exported Transformation point cloud model is the target three-dimensional reconstruction result for inputting RGB image, no to then follow the steps S9；

2. a kind of target three-dimensional rebuilding method based on self-supervisory study according to claim 1, which is characterized in that described The point potential feature of cloud is obtained in step S1Detailed process are as follows:

S11, by true point cloud data input point cloud self-encoding encoderThe feature of B × N × 512 is obtained after 5 layer of 1 dimension convolution；

The feature of B × N × 512 obtains the potential spy of point cloud of B × 512 by maximum pond layer operation in S12, the step S11 SignWherein, k=512.

3. a kind of target three-dimensional rebuilding method based on self-supervisory study according to claim 1, which is characterized in that described The midpoint step S1 cloud decoder D_PIncluding three layers of full articulamentum, the potential feature of cloud will be putIt is converted into the point cloud lattice of B × N × 3 Formula.

4. a kind of target three-dimensional rebuilding method based on self-supervisory study according to claim 1, which is characterized in that described The potential feature of binary map is obtained in step S2Detailed process are as follows:

S21, binary conversion treatment is carried out to RGB image, obtains true binary mapWherein, it is 0 that binary conversion treatment, which is by pixel value, Place indicated with 0, pixel value be non-zero place indicated with 1；

S22, by true binary mapInput binary map self-encoding encoderObtain the potential feature of binary map Wherein, k =512.

5. a kind of target three-dimensional rebuilding method based on self-supervisory study according to claim 1, which is characterized in that described Binary map decoder D in step S2_IUsing deconvolution operate carry out picture material filling so that picture material gradually enrich to Restore binary map.

6. a kind of target three-dimensional rebuilding method based on self-supervisory study according to claim 1, which is characterized in that described Pose net returns out image aspects information using full articulamentum in step S4, to obtain image pose, described image visual angle letter Breath includes (α, beta, gamma, a, b, c) totally six parameters, wherein (α, beta, gamma) is deflection, respectively indicates yaw angle, pitch angle and Roll angle, (a, b, c) are translation vector；

Described image pose is (R, t), wherein R is spin matrix, and t=(a, b, c) arrives spin moment by deflection (α, beta, gamma) The conversion formula of battle array R is as follows:

7. a kind of target three-dimensional rebuilding method based on self-supervisory study according to claim 1, which is characterized in that described Step S5 specifically includes the following steps:

8. a kind of target three-dimensional rebuilding method based on self-supervisory study according to claim 1, which is characterized in that described Step S7 specifically includes the following steps:

9. a kind of target three-dimensional rebuilding method based on self-supervisory study according to claim 6, which is characterized in that described It is specifically that image pose is multiplied with preliminary point cloud model that transformation point cloud model is generated in step S6, to make preliminary point cloud model Transform to camera plane:

x′_i=Rx_i+t i∈[0,N-1]

Wherein, x_iFor the point in preliminary point cloud model, x '_iTo convert the point in point cloud model, include in N expression three-dimensional structure The number of point, each point x_iIt is multiplied after transformation with image pose (R, t) and obtains x '_i。