CN108876907A

CN108876907A - A kind of active three-dimensional rebuilding method of object-oriented object

Info

Publication number: CN108876907A
Application number: CN201810576919.3A
Authority: CN
Inventors: 王元博; 杨鑫; 魏小鹏; 尹宝才; 张强
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2018-05-31
Filing date: 2018-05-31
Publication date: 2018-11-23

Abstract

The invention belongs to technical field of computer vision, provide a kind of active three-dimensional rebuilding method of object-oriented object, specifically include following two module：Visual angle dynamic prediction module and Target self-determination rebuild module；Two modules all carry out following steps：Module input, module architectures and training method.The present invention is in order to solve conventional three-dimensional Object reconstruction vulnerable to environmental factor interference, inefficiency and the technical problem for being difficult to realize independence, devise the autonomous reconstruction framework of an objective based on depth learning technology and software platform, it can be to given target, Dynamic Programming scans visual angle, and in combination with the picture under different perspectives, the building of target three-dimensional is completed.

Description

A kind of active three-dimensional rebuilding method of object-oriented object

Technical field

The invention belongs to technical field of computer vision, more particularly to are carried out independently based on deep learning to single target The method of three-dimensional reconstruction.

Background technique

With the development of SLAM technology (Simultaneous Localization and Mapping), indoor scene Three-dimensional rebuilding method reaches its maturity.Three-dimensional reconstruction generally comprises three parts, is carried out first using handheld camera to target to be reconstructed Then the scanning at multiple visual angles carries out the extraction, matching and the estimation of camera pose of feature, finally to the multiframe picture scanned Mapping of the two-dimensional pixel to three-dimensional coordinate point, the model finally rebuild are completed by stereovision technique.However, previous Work in, to the scanning of target often using the scanning at a kind of " no dead angle ", i.e. scanning each office for wanting coverage goal Portion's structure, inefficiency while, which also results in, can not be suitable for other types to the planning parameters of scanning paths of a certain target, to sweep The independence retouched brings challenges.Therefore, a kind of the efficient reconstruction method at contexture by self scanning visual angle to be in reconstruction process The invention motivation of technical barrier and this patent currently to be captured.Next relevant background in this field is discussed in detail Technology.

(1) three-dimensional reconstruction

Early in 2010, University of Washington and Microsoft laboratory were developed based on SIFT (Scale invariant features transform) spy Sign matching positioning and the real-time vision SLAM (positioning immediately of TORO (Tree-based Network Optimizer) optimization algorithm With map structuring) system by this real-time system can establish out the three-dimensional map of scene.Then there are many work in reality It is improved on Shi Xing, pose estimation, including RGBD-SLAM algorithm, KinectFusion, BundleFusion etc., it is substantially full The real-time that foot scene rebuilding is interacted with user.

However these algorithms will inevitably face following problem：First, algorithm needs dense visual angle, and It is practical calculate in but skip most of visual angle so that the acquisition inefficiency of information, and can not apply and having the more field blocked Jing Zhong；Second, algorithm needs to assume that target does not have mirror-reflection in scene, while possessing texture abundant, to meet to picture Feature extraction；Third, although having used a variety of optimisation strategies, the accumulation problem of camera registration error is still remained.

To solve the above-mentioned problems, Choy et al. works the three-dimensional reconstruction that deep learning introduces target, proposes three The Recognition with Recurrent Neural Network of dimension, by building three-dimensional hiding layer state, to receive the picture shot under multiple visual angles, and meanwhile it is recessive Indicate the geometry currently rebuild, to establish single-view and the united three-dimensional reconstruction frame of multi-angle of view, single-view with Multi-angle of view (less than 20 visual angles) can show the effect of beyond tradition method.

The application of depth learning technology is that three-dimensional reconstruction opens new approaches, compares conventional method, uses less vision Input, is capable of handling complex environment factor, and the combination of conventional method and deep learning thought may be the following three-dimensional reconstruction neck The new guiding of one of domain.

(2) the autonomous acquisition of three-dimensional information

Robot below one arbitrarily visual angle to unknown target, it is next come active predicting based on current observation Observation visual angle, and visual angle is estimated based on the observation at next visual angle, such a continuous view prediction can help machine The interested information of people's actively perceive, completes corresponding visual task.Active sense is carried out to ambient enviroment by consumer level camera Know, and the environmental information digitlization that will acquire is technical barrier to be captured in robot field.In previous work, often Make to receive information maximization by the constraint in some rules.Common method includes that entropy is reduced, and uncertainty is reduced, Meng Teka Sieve sampling, Gaussian process recurrence etc..

Also there is using the method for deep learning the selection for predicting visual angle in recent years, such as network-evaluated using depth confidence The information gain at visual angle, visual attention model based on enhancing study etc..Its middle finger it is emphasised that the three-dimensional of Xu Kai et al. Target self-determination identifies work.The worked combination multi-angle of view convolutional neural networks [24] and circulation attention model, realize mesh Mark does not in the process obtain the active of depth data, and introduces space in subsequent work and migrate network, realizes end and arrives The study at end.The actively perceive of the program be embodied in for current visual angle obtain visual observation, can predict it is next most Good visual angle realizes the view prediction of view-based access control model feedback, allows the robot to the identification for independently completing target.

In view of the foregoing it is apparent that current generation, autonomous this problem of the visual information also still place's exploration perceived in environment Stage, current work are mainly placed on center of gravity in the identification and retrieval of target, also need in more complicated Object reconstruction problem It further to explore.

Summary of the invention

The present invention in order to solve conventional three-dimensional Object reconstruction vulnerable to environmental factor interference, inefficiency and be difficult to realize from The technical problem of main property devises the autonomous reconstruction framework of an objective based on depth learning technology and software platform, energy Enough to given target, Dynamic Programming scans visual angle, and in combination with the picture under different perspectives, completes target three-dimensional Building.

Technical solution of the present invention：

A kind of active three-dimensional rebuilding method of object-oriented object, specifically includes following two module：

(1) visual angle dynamic prediction module：

(1.1) module inputs：

Carry out target information in collection room using RGBD camera, any arbitrary viewing angles v around target₀, with RGBD camera Shooting obtains photochrome, using wherein tri- channels RGB and by resolution compression to 64 × 64, obtains 64 × 64 × 3 Measure I₀, arbitrary viewing angles v₀With picture tensor I₀Collectively form the input of module；

(1.2) module architectures：

Visual angle dynamic prediction module is the neural network of a Ge Liang branch, in different times walk under have different states, It outputs and inputs；It is sometime walking in t, the first branching networks are a full articulamentum f_view, with arbitrary viewing angles v_tAs defeated Enter, calculates corresponding visual angle characteristic f_view(v₀)；Second branching networks are a multilayer convolution loop neural network f_enc, become Coding network is responsible for that picture tensor I will be inputted_tIt is encoded to the feature of low-dimensionalWhereinIt is circulation Layer is in the storage state of a upper time step, and in t=1, value is unit matrix；

The feature that then two branching networks are extracted carries out the multiplication of Element-Level, and in another circulation layer f_gruWith it is complete Articulamentum f_fcProcessing under, obtain final feature vector such as formula (1)：

Wherein,It is storage state of the circulation layer in a upper time step, in t=1, value is unit matrix；

Finally, feature vector F to be passed through to the processing of S function (sigmoid), the visual angle finally predicted, as next The input viewing angle v of time step_t+1, while the picture under the visual angle is obtained using RGBD camera, obtain the input of future time step Measure I_t+1；

(1.3) training method：

The training of neural network is pressed in each time step using enhancing learning method in the dynamic prediction module of visual angle Illuminated (2) calculates reward：

Wherein,For the Three-dimension Reconstruction Model under t moment, V is the true model in database, and IoU is two model weights Folded element number accounts for the specific gravity of whole elements；

After the reward for each time step that adds up, the optimization method based on strategy estimation, training network are used；It is specifically used The gradient that the method for gradient decline calculates then updates network parameter in the direction iteration that gradient reduces, obtains pre- such as formula (3) Survey the neural network at optimal visual angle：

Wherein, R is the cumulative award of all time steps,The probability at the visual angle is obtained for prediction；

(2) Target self-determination rebuilds module：

(2.1) module inputs：

Picture is shot under the visual angle predicted in the dynamic prediction module of visual angle using RGBD camera, it will according to time step sequence Every frame picture obtains orderly picture tensor sequence { I for identical processing method in visual angle dynamic prediction module₀, I₁..., I_n, the input as module；

(2.2) module architectures：

It is a Recognition with Recurrent Neural Network that Target self-determination, which rebuilds module, includes two parts of coding network and decoding network；It compiles Code network f_encUsing the coding network (second point to network) of visual angle dynamic prediction module, i.e. a multilayer convolution loop nerve Network is responsible for the picture tensor I inputted under coding t time step_tLow-dimensional feature F_t；Decoding network f_decBy the three-dimensional warp of multilayer Product circulation layer composition, by the picture feature F of low-dimensional_tIt rises dimension and obtains three-dimensional voxelWherein It is storage state of the circulation layer in a upper time step, in t=1, value is unit matrix；

The sequence of pictures being made of for one t time step is sequentially input to visual angle dynamic prediction mould according to time step In block, the three-dimensional voxel predicted under the last one (i.e. t-th) time step is the three-dimensional reconstruction result of target；

(2.3) training method：

Target self-determination rebuild module using direction propagate with the training of the method for stochastic gradient descent, for a lot sample sheet, The error between the prediction result of network and database true value result is calculated according to formula (4), and calculates the gradient of error, according to The backpropagation of neural network gradually updates network parameter along the direction of gradient decline, and iteration is until convergence；

L_vox=| | V_pre-V||² (4)

Wherein, V_preIt is respectively indicated with V corresponding in the voxel model and database of Target self-determination reconstruction module network prediction Voxel model.

The present invention has outstanding feature compared with similar, and specific detailed description are as follows：

(1) towards the independence of reconstruction process

It is different from traditional scanning mode at " no dead angle ", independence of this patent towards reconstruction process.It can specifically show At two aspects.First, other method for reconstructing mostly use pre-specified path, using sensor to each part of target Structure is scanned, and the three-dimensional information of target is calculated by processing multiframe scanning information.Which results in a certain target The route of design not can be used directly other class method for distinguishing generally, and universality is not strong, and especially blocking for environment may Hinder the planning in some paths, it is difficult to realize planning from main perspective for reconstruction.And the dynamic visual angle prediction of this patent can be directed to Different types of target carries out the prediction and Target Modeling at visual angle simultaneously in reconstruction process, realizes independence；Second, other Method for reconstructing determines scanning the obtained information content in visual angle often through the measurement to reconstructed results, and this patent is using a kind of The mode of " seeing i.e. gained ", without reconstruction, directly judges next optimal viewing angle, raising efficiency for the photo of shooting While to assign entire automated processes more " intelligence ".

(2) trial of depth learning technology

From traditional multiframe match it is different with the method for stereoscopic vision, this patent trial use depth learning technology to target into Row three-dimensional modeling, this results in the advantages of two aspects.First, other methods generally require to calculate the matching spy for closing on interframe Sign, therefore it is required that number viewpoints guarantee that the overlapping region for closing on interframe is enough big more.And deep learning method due to itself Data-driven ability, can be used the information under a certain visual field of target to predict the information under other visuals field and global structure, The three-dimensional reconstruction for completing target using few visual angle is enabled the system to, the effect of system is improved while improving robustness Rate；Second, other methods are limited by environmental factor, it is difficult to extract under intense light irradiation, mirror-reflection, obstruction conditions enough Feature is unable to reach ideal reconstruction effect to increase the error of camera pose estimation.However depth learning technology itself With the ability for predicting global information from local message, for rugged environment factor, the spy of robust often can be also extracted Sign, obtains preferable reconstructed results.

Detailed description of the invention

Fig. 1 is concept database figure.Database includes the magnanimity target data of multiple classifications, and different target is by individual rope It is incorporated in retrieval, and each target separately includes the data of two kinds of forms, is that (whole visual angles need to cover at multiple visual angles respectively All structures of target) under the two-dimension picture rendered and voxelization three-dimensional voxel.

Fig. 2 is network architecture diagram.The network architecture diagram of autonomous reconstruction model has only drawn the data flow of a time step in figure Emotionally condition, network include that Target self-determination rebuilds module and visual angle dynamic prediction module, are respectively used to receive reconstruction model and prediction Next visual angle.

Fig. 3 is autonomous reconstruction flow chart.Scheming a indicates that visual angle camera position in Dynamic selection processes converts, and figure b indicates whole A flow chart independently rebuild originates in an arbitrary viewing angles, shoots the picture under the visual angle, respectively by trained in advance Visual angle dynamic prediction module and Target self-determination rebuild module.The former is responsible for receiving picture and current visual angle, judge one it is best Visual angle；The latter is for rebuilding current threedimensional model.This is a duplicate process, and visual angle dynamic prediction module repeatedly judges Next visual angle is to obtain the input of next picture, and Target self-determination rebuilds module then constantly reconstruction model, until iteration ends.

Specific embodiment

Invention is further described in detail With reference to embodiment, but the invention is not limited to specific realities Apply mode.

A method of autonomous three-dimensional reconstruction being carried out to single target based on deep learning, the training including network model And the autonomous reconstruction part of model

(1) training network model

A large-scale database is constructed first, is included different classes of magnanimity three-dimensional grid model in database, is passed through Multi-angle of view rendering is carried out to grid model and grid voxelization obtains training data (the three-dimensional voxel model comprising target and more views Angle picture illustrates visible Fig. 1), while network model is built according to Fig. 2；Then training data multithreading is transported in batches In network model to be trained, and visual angle dynamic prediction module and Target self-determination reconstruction module are calculated according to formula (3) and formula (4) In error；Finally decline optimizer iteration according to the method for backpropagation gradient and update network parameter, until the number of iterations It meets the requirements, completes the training of network.

(2) active reconstruction process

The target to be reconstructed for one, operate machine people or human hand held RGBD camera come it is a certain random around target Position, and guarantee that there are targets in the camera fields of view under the visual angle.It is shot with camera and obtains photo, it is logical using wherein RGB tri- Road and the compression for carrying out resolution ratio are input in housebroken network, and at this moment Target self-determination, which rebuilds module, can export current reconstruction Good model, and visual angle dynamic prediction module can export next optimal viewing angle.Robot or human hand held RGBD camera are come down One optimal viewing angle continues to shoot picture, be transported in housebroken network, and obtains the model rebuild and next view Angle.This process repeats, until the model rebuild meet the requirements or visual angle is defeated have reached preset threshold value when termination, The structure that Target self-determination rebuilds module output at this time is final Object reconstruction model, illustrates visible Fig. 3.

Claims

1. a kind of active three-dimensional rebuilding method of object-oriented object, which is characterized in that active three-dimensional rebuilding method has Two modules：

(1) visual angle dynamic prediction module：

(1.1) module inputs：

Carry out target information in collection room using RGBD camera, any arbitrary viewing angles v around target₀, shot with RGBD camera 64 × 64 × 3 tensor I is obtained using wherein tri- channels RGB and by resolution compression to 64 × 64 to photochrome₀, with Machine visual angle v₀With picture tensor I₀Collectively form the input of module；

(1.2) module architectures：

Visual angle dynamic prediction module is the neural network of a Ge Liang branch, there is different states, input under walking in different times And output；It is sometime walking in t, the first branching networks are a full articulamentum f_view, with arbitrary viewing angles v_tAs input, meter Calculate corresponding visual angle characteristic f_view(v₀)；Second branching networks are a multilayer convolution loop neural network f_enc, become coding Network is responsible for that picture tensor I will be inputted_tIt is encoded to the feature of low-dimensionalWhereinIt is circulation layer In the storage state of a upper time step, in t=1, value is unit matrix；

The feature that then two branching networks are extracted carries out the multiplication of Element-Level, and in another circulation layer f_gruWith full connection Layer f_fcProcessing under, obtain final feature vector such as formula (1)：

Finally, feature vector F is handled by S function, the visual angle finally predicted, the input viewing angle as future time step v_t+1, while the picture under the visual angle is obtained using RGBD camera, obtain the input tensor I of future time step_t+1；

(1.3) training method：

The training of neural network is using enhancing learning method in the dynamic prediction module of visual angle, in each time step, according to formula (2) reward is calculated：

Wherein,For the Three-dimension Reconstruction Model under t moment, V is the true model in database, and IoU is two model overlapping members Plain number accounts for the specific gravity of whole elements；

After the reward for each time step that adds up, the optimization method based on strategy estimation, training network are used；Specifically used gradient The gradient that the method for decline calculates then updates network parameter in the direction iteration that gradient reduces, obtains predicting most such as formula (3) The neural network at excellent visual angle：

(2) Target self-determination rebuilds module：

(2.1) module inputs：

Picture is shot under the visual angle predicted in the dynamic prediction module of visual angle using RGBD camera, according to time step sequence by every frame Picture obtains orderly picture tensor sequence { I for identical processing method in visual angle dynamic prediction module₀, I₁..., I_n, Input as module；

(2.2) module architectures：

It is a Recognition with Recurrent Neural Network that Target self-determination, which rebuilds module, includes two parts of coding network and decoding network；Encode net Network f_encUsing the coding network of visual angle dynamic prediction module, i.e. a multilayer convolution loop neural network, it is responsible for the coding t time The lower picture tensor I inputted of step_tLow-dimensional feature F_t；Decoding network f_decIt is made of the three-dimensional deconvolution circulation layer of multilayer, it will be low The picture feature F of dimension_tIt rises dimension and obtains three-dimensional voxelWhereinIt was circulation layer in the upper time The storage state of step, in t=1, value is unit matrix；

The sequence of pictures being made of for one t time step is sequentially input in the dynamic prediction module of visual angle according to time step, The three-dimensional voxel predicted under the last one time step is the three-dimensional reconstruction result of target；

(2.3) training method：

Target self-determination rebuild module using direction propagate with the training of the method for stochastic gradient descent, for a lot sample sheet, according to Formula (4) calculates the error between the prediction result of network and database true value result, and calculates the gradient of error, according to nerve The backpropagation of network gradually updates network parameter along the direction of gradient decline, and iteration is until convergence；

L_vox=| | V_pre-V||² (4)

Wherein, V_preTarget self-determination, which is respectively indicated, with V rebuilds corresponding voxel in the voxel model and database of module network prediction Model.