CN110223380A

CN110223380A - Fusion is taken photo by plane and the scene modeling method of ground multi-view image, system, device

Info

Publication number: CN110223380A
Application number: CN201910502762.4A
Authority: CN
Inventors: 申抒含; 高翔; 朱灵杰; 胡占义
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2019-06-11
Filing date: 2019-06-11
Publication date: 2019-09-10
Anticipated expiration: 2039-06-11
Also published as: CN110223380B

Abstract

The invention belongs to scene modeling fields, it takes photo by plane and the scene modeling method of ground multi-view image, system, device more particularly to a kind of fusion, it is intended that solve that structure is complicated, texture shortage for indoor scene, modeling result based on image is imperfect, inaccurate fusion.The method of the present invention includes: S100, obtains the multi-view image of taking photo by plane of indoor scene to be modeled, and construct map of taking photo by plane；S200, based on the map of taking photo by plane, method by synthesizing ground multiview reference picture by map of taking photo by plane obtains composograph；S300, the ground multi-view image acquired by ground camera obtain ground multi-view image set；S400 is based on the composograph, the multi-view image of taking photo by plane is merged with the ground multi-view image, obtains indoor scene model.Complete, accurate indoor scene model can be generated in the present invention, takes into account collecting efficiency and reconstruction precision, and have stronger robustness.

Description

Fusion is taken photo by plane and the scene modeling method of ground multi-view image, system, device

Technical field

The invention belongs to scene modeling fields, and in particular to a kind of to merge the scene modeling side to take photo by plane with ground multi-view image Method, system, device.

Background technique

Indoor scene three-dimensional reconstruction plays important function, such as indoor navigation, service-delivery machine in many practical applications People, Building Information Model (BIM, building information modeling) etc..Existing indoor scene method for reconstructing can Be roughly divided into three classes: (1) method for being based on laser radar (LiDAR, light detection and ranging), (2) are based on The method of RGB-D camera, the method for (3) based on image.

It is biggish rebuilding although the method based on LiDAR and the method based on RGB-D camera have higher precision When indoor scene, there is the problems such as higher cost, expansion is poor in above two method.For the method based on LiDAR, by Cause scene to be blocked in the limitation of scanning visual angle to be difficult to avoid that, laser scanning and the point cloud of multi-angle of view are generally required when being scanned Alignment.For the method based on RGB-D camera, since sensor effective working distance is limited, need to acquire, a large amount of number of processing According to.Therefore, there is high cost, inefficient deficiency when carrying out the reconstruction of extensive indoor scene in the above method.

Relative to the method based on LiDAR and the method based on RGB-D camera, although the method cost based on image is more Low, flexibility is stronger, and there is also some shortcomings for such methods, as caused by complex scene, repetitive structure, shortage texture etc. Imperfect, inaccurate reconstructed results.Even if current state-of-the-art structure from motion (SfM, structure from Motion) with multiple view stereo technology (MVS, multiple view stereo) technology, larger, structure is more complex Reconstruction effect in indoor scene is still unsatisfactory.In addition, it is some based on the method for image using some a priori assumptions come Indoor scene Problems of Reconstruction is handled, such as the Manhattan world is assumed.Although these methods can sometimes obtain preferably As a result, still, in the case where not meeting a priori assumption, these methods frequently can lead to the reconstructed results of mistake.

Summary of the invention

In order to solve the above problem in the prior art, in order to solve to be directed to indoor scene, structure is complicated, texture lacks, The problem of modeling result based on image is imperfect, inaccurate fusion, first aspect present invention, propose a kind of fusion take photo by plane with The scene modeling method of ground multi-view image, comprising the following steps:

Step S100 obtains the multi-view image of taking photo by plane of indoor scene to be modeled, and constructs map of taking photo by plane；

Step S200, based on the map of taking photo by plane, method by synthesizing ground multiview reference picture by map of taking photo by plane is obtained Take composograph；

Step S300, the ground multi-view image acquired by ground camera obtain ground multi-view image set；

Step S400 is based on the composograph, the multi-view image of taking photo by plane is melted with the ground multi-view image It closes, obtains indoor scene model.

In some preferred embodiments, in step S100 " multi-view image of taking photo by plane of indoor scene to be modeled is obtained, and Construct map of taking photo by plane ", method are as follows:

To the multi-view video of taking photo by plane of indoor scene, frame method abstract image is taken out using the adaptive video based on bag of words Frame obtains the multi-view image set of taking photo by plane of indoor scene；

Based on the multi-view image set of taking photo by plane, map of taking photo by plane is constructed by image modeling method.

In some preferred embodiments, " side of ground multiview reference picture is synthesized by map of taking photo by plane in step S200 Method ", method are as follows:

Based on map of taking photo by plane, virtual camera pose is calculated；

Algorithm is cut by figure, obtains the composograph that map of taking photo by plane obtains ground multiview reference picture；

In some preferred embodiments, " algorithm is cut by figure, map of taking photo by plane is obtained and obtains ground multiview reference picture Composograph ", method are as follows:

Wherein, E (l) is the energy function during figure is cut；It is projected for the visible three-dimensional space grid of virtual camera The two-dimentional triangle sets arrived, t_iFor i-th of triangle therein；To project obtained two-dimentional triangle sets intermediate cam shape Public line set；l_iFor t_iAerial Images serial number；D_i(l_i) it is data item；V_i(l_i,l_j) it is smooth item；

As corresponding t_iSpace dough sheet in l_iWhen visible in a Aerial Images, data itemOtherwise If D_i(l_i)=α, whereinFor l_iThe scale intermediate value of local feature in a Aerial Images,For corresponding t_iSpace dough sheet In l_iProjected area in a Aerial Images, α are a biggish constant；

Work as l_i=l_jWhen, smooth item V_i(l_i,l_j)=0；Otherwise V_i(l_j,l_j)=1.

In some preferred embodiments, " the ground multi-view image acquired by ground camera obtains ground in step S300 Face multi-view image set ", method are as follows:

Ground robot is based on planning path, passes through the ground camera continuous acquisition ground multi-view video being arranged thereon；

To the ground multi-view video of indoor scene, frame method abstract image is taken out using the adaptive video based on bag of words Frame obtains the ground multi-view image set of indoor scene.

In some preferred embodiments, " ground robot is based on planning path, is connected by the ground camera being arranged thereon During continuous acquisition ground multi-view video ", localization method includes initial machine people positioning, localization for Mobile Robot；

The initial machine people positioning, method are as follows: the first frame for obtaining ground camera acquisition video obtains robot and exists Initial position in the map of taking photo by plane, and using the position as the starting point of robot subsequent motion；

The localization for Mobile Robot, method are as follows: carried out based on each moment running data of initial position and robot Robot location's coarse positioning is obtained robot and is worked as by matching current time acquired video frame images and the composograph Position of the preceding moment in the map of taking photo by plane, and with the location information of position revision coarse positioning.

In some preferred embodiments, step S400 " is based on the composograph, by multi-view image and the institute of taking photo by plane State ground multi-view image to be merged, obtain indoor scene model ", method are as follows:

Obtain position of the corresponding ground camera of every piece image in the map of taking photo by plane in ground multi-view image set；

By ground multi-view image and composograph match point be connected into it is original take photo by plane in the terrain surface specifications locus of points, generate across The constraint of view；

It is optimized to taking photo by plane with ground image pose by bundle adjustment (BA, bundle adjustment)；

Dense reconstruction is carried out with ground multi-view image using taking photo by plane, obtains the dense model of indoor scene.

The second aspect of the present invention proposes a kind of scene modeling system for merging and taking photo by plane with ground multi-view image, this is System includes take photo by plane map structuring module, composograph acquisition module, multi-view image set acquisition module, the acquisition of indoor scene model Module；

The map structuring module of taking photo by plane is configured to obtain the multi-view image of taking photo by plane of indoor scene to be modeled, and constructs It takes photo by plane map；

The composograph obtains module, is configured to the map of taking photo by plane, by synthesizing ground view by map of taking photo by plane The method of angle reference picture obtains composograph；

The multi-view image set obtains module, is configured to the ground multi-view image acquired by ground camera, obtains ground Face multi-view image set；

The indoor scene model obtains module, is configured to the composograph, by it is described take photo by plane multi-view image with The ground multi-view image is merged, and indoor scene model is obtained.

The third aspect of the present invention proposes a kind of storage device, wherein be stored with a plurality of program, described program be suitable for by Processor is loaded and is executed to realize scene modeling method that above-mentioned fusion is taken photo by plane with ground multi-view image.

The fourth aspect of the present invention proposes a kind of processing unit, including processor, storage device；Processor, suitable for holding Each program of row；Storage device is suitable for storing a plurality of program；Described program is suitable for being loaded by processor and being executed above-mentioned to realize The scene modeling method taken photo by plane with ground multi-view image of fusion.

Beneficial effects of the present invention:

The present invention advances in scene indoors by constructing three-dimensional map guided robot of taking photo by plane and acquires ground view Then angle image is merged to taking photo by plane with ground image, and generate complete, accurate indoor scene by fused image Model.Indoor scene of the present invention rebuilds process and takes into account collecting efficiency and reconstruction precision, also, has stronger robustness.

Detailed description of the invention

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:

Fig. 1 is that the fusion of an embodiment of the present invention is taken photo by plane and shown with the scene modeling method process frame of ground multi-view image It is intended to；

Fig. 2 is showing for the map of taking photo by plane that a video frame extracted through 271 width is rebuild in an embodiment of the present invention Example diagram；

Fig. 3 is the image synthesis schematic diagram in an embodiment of the present invention based on grid；

Fig. 4 is the exemplary relationship figure in an embodiment of the present invention between local feature scale and image definition；

Fig. 5 is the image composite result exemplary diagram cut under different configurations based on figure in an embodiment of the present invention；

Fig. 6 is the ground image exemplary diagram under other image composite result and similar visual angle as a comparison；

Fig. 7 is images match result exemplary diagram in an embodiment of the present invention；

Fig. 8 is that candidate matches composograph searches schematic diagram in robot kinematics in an embodiment of the present invention；

Fig. 9 is batch type camera positioning flow schematic diagram in an embodiment of the present invention；

Figure 10 is the batch type camera positioning result exemplary diagram based on three kinds of feature point trajectories in an embodiment of the present invention；

Figure 11 is batch type camera position fixing process exemplary diagram in an embodiment of the present invention；

Figure 12 is to generate schematic diagram with the terrain surface specifications locus of points for taking photo by plane for airplane view in an embodiment of the present invention；

Figure 13 is the data acquisition equipment used in the test of an embodiment of the present invention；

Figure 14 is the three-dimensional of the example Aerial Images in the test of an embodiment of the present invention in Hall data set and generation It takes photo by plane map example figure；

Figure 15 be in the test of an embodiment of the present invention Hall data set take photo by plane on video the present invention take out frame method with Frame method contrast and experiment exemplary diagram is taken out at equal intervals；

Figure 16 is the qualitative comparing result exemplary diagram that ground camera positions in the test of an embodiment of the present invention；

Figure 17 is that indoor scene rebuilds qualitative results exemplary diagram in the test of an embodiment of the present invention.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to the embodiment of the present invention In technical solution be clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, without It is whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not before making creative work Every other embodiment obtained is put, shall fall within the protection scope of the present invention.

The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.

It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.

Due to the complexity of indoor scene, scene perfect reconstruction, which need to consider following two, to be realized for the method based on image Problem.First is image acquisition process, i.e., how to acquire image with it is complete, efficiently cover indoor scene.Second is field Scape algorithm for reconstructing, i.e., how SfM with the image for merging different perspectives during MVS to obtain complete, accurate reconstructed results. For above-mentioned two problem, the invention proposes a kind of novel indoor scene acquisitions based on image and reconstruction process.The process It has used mini aircraft and ground robot and has included four key steps (as shown in Figure 1): (1) map structuring of taking photo by plane: having adopted Multi-view image of taking photo by plane is acquired indoors with a mini aircraft, then obtains the three of characterization indoor scene by multi-view image of taking photo by plane Hexagonal lattice, and be used for as the map of ground robot location navigation；(2) reference picture synthesize: in map of taking photo by plane into Row plane monitoring-network obtains ground level and is used for ground robot path planning.Then, several ground views are synthesized based on map of taking photo by plane Angle image, the positioning for ground robot；(3) ground robot positions: ground robot enters the room scene and carries out ground The acquisition of multi-view image.While acquiring image when robot is in movement, pass through the image of matching acquisition and the ground of synthesis Multi-view image realizes the positioning of robot；(4) indoor scene is rebuild: after ground robot completes Image Acquisition, by base Mini aircraft image and ground robot image are merged in the modeling procedure of image, realize indoor scene it is complete with it is accurate Modeling.

In modeling procedure of the invention, manual operation, subsequent surface map are only needed in Aerial Images collection process As acquisition and indoor scene modeling process are full-automatic realization, it means that process expansion of the invention is strong, is suitable for The acquisition and reconstruction of extensive indoor scene.The acquisition of Aerial Images can also be by independent navigation according to the guidance path of acquisition Automatic collection Aerial Images, but which increase the complexity of algorithm, therefore preferentially select manual operation, to guarantee to obtain image Flexibility and integrality and expansibility.

Compared to the ground image of ground robot acquisition, the Aerial Images of mini aircraft acquisition possess better visual angle Bigger visual field, it means that relative to ground image, blocking in Aerial Images can be smaller with error hiding problem.Therefore, It can be more reliably used in subsequent ground robot position fixing process by the map that Aerial Images generate.

The Aerial Images of mini aircraft shooting are complementary to one another with the ground image that ground robot is shot and can be complete Whole covering indoor scene.Therefore, it is taken photo by plane and ground image by fusion, available more complete, accurate indoor scene mould Type.

The scene modeling method that a kind of fusion of the invention is taken photo by plane with ground multi-view image, comprising the following steps:

It is illustrated to more clearly take photo by plane to present invention fusion with the scene modeling method of ground multi-view image, below Expansion detailed description is carried out to each step in a kind of embodiment of our inventive method in conjunction with attached drawing.

The scene modeling method that the fusion of the invention of an embodiment of the present invention is taken photo by plane with ground multi-view image, including step Rapid S100-S400.

Step S100 obtains the multi-view image of taking photo by plane of indoor scene to be modeled, and constructs map of taking photo by plane.

First using the video of taking photo by plane of mini aircraft acquisition indoor scene, and some images are extracted from video.Then It is rebuild to obtain model of taking photo by plane by image of the process based on image modeling to extraction, and is determined with as ground robot The three-dimensional map of position.

Step S101 takes out frame side using the adaptive video based on bag of words to the multi-view video of taking photo by plane of indoor scene Method abstract image frame obtains the multi-view image set of taking photo by plane of indoor scene.

By acquiring top-down multi-view video of taking photo by plane in mini aircraft indoors scene in the present embodiment, acquisition Video resolution is 1080p, frame per second 25FPS.Since mini unmanned plane size is small, flexibility ratio is high, is very suitable for indoor field Scape shooting.For example, the mini aircraft used in the present embodiment is the DJI Spark that is mounted with stabilizer Yu 4K camera, Its weight is only 300g.In addition, carrying out shooting relative to ground visual angle to indoor scene from visual angle of taking photo by plane and being not easily susceptible to scene screening The influence of gear, therefore can more efficient, complete covering scene using mini aircraft.

The video of taking photo by plane of acquisition is provided, positioning simultaneously and composition (SLAM, simultaneous can be passed through Localization and mapping) system building takes photo by plane map.However, in the present embodiment, using offline SfM technology Carry out map structuring of taking photo by plane.This is because: (1) takes photo by plane in embodiment, map is positioned for ground robot, therefore need not It is constructed online；(2) compared with the SLAM for being easy to produce scene drift phenomenon, SfM is more applicable for large scale scene modeling. But if using SfM take photo by plane map structuring when, it is clear that do not need to use all frames in video of taking photo by plane.Because taking photo by plane Contain a large amount of redundancy in video frame, this can seriously reduce the efficiency of SfM map structuring.To solve the above problems, one Direct method is exactly to extract a frame at interval of fixed frame number in video, then carries out map structure with the video frame extracted It builds.However, there are still some disadvantages for this way: (1) being difficult by being realized in the mini aircraft of manual operation indoors scene Stablize, the video acquisition of constant speed, and this problem can become more difficult in course line corner；(2) due in indoor scene Texture-rich degree is inconsistent, therefore it is also inappropriate for carrying out uniform fold to scene.Taking photo by plane ground for solution is above-mentioned The problem of figure building process, use in the present embodiment it is a kind of based on bag of words (BoW, bag of words) model from Adaptive video takes out frame method, and details are as follows for process:

In BoW model, piece image can be expressed as a normalized vector vi, and a pair of of image similarity can pass through The dot product of corresponding vectorIt indicates.As it is known by one skilled in the art, similarity excessively high between adjacent image can introduce Excessive redundancy, and then reduce composition efficiency；And similarity too low between adjacent image then will lead between image and connect Property is poor, and composition is imperfect.Therefore, the side of an adaptive decimation subset from all video frames is proposed in the present embodiment Method, when taking out frame, this method limits the similarity between the video frame of the video frame extraction adjacent thereto of each extraction at one In suitable range.Specifically, the normalized vector v that the library libvot generates each frame is first passed through_i, and using first frame as rise Initial point.During taking out frame, it is assumed that current i-th frame has been extracted, and obtains the score of the similarity between the frame and its subsequent frame: {s_i,j| j=i+1, i+2 ... }, whereinThen, willIt is compared with preset similarity threshold t, this T=0.1 in embodiment；Assuming thatFor { s_i,jIn first meet such as lower inequality: s_i,j< t, then jth^*- 1 frame (i.e. first A previous frame for meeting above-mentioned inequality) be next extraction video frame.Above process iteration carries out, until having verified all Video frame.

Step S102 constructs map of taking photo by plane by image modeling method based on the multi-view image set of taking photo by plane.

Based on the multi-view image set of taking photo by plane that step S101 is obtained, constructed by standard set based on image modeling process It takes photo by plane map, which includes: (1) SfM, (2) MVS, (3) resurfacing.In addition, since indoor reception is less than GPS signal, it can To zoom to its actual physical size by ground control point (GCP, the ground control point) map that will take photo by plane.Fig. 2 The map example of taking photo by plane rebuild of video frame extracted for one through 271 width, first three in figure be classified as example Aerial Images and its Corresponding three-dimensional is taken photo by plane map area, and the 4th, which is classified as entire three-dimensional, takes photo by plane map, and the 5th is classified as in the robot on map that takes photo by plane Path planning and virtual camera pose calculated result, wherein ground level is designated as light gray, and planning path is designated as figure middle conductor, virtually Camera pose is indicated by pyramid.

Step S200, based on the map of taking photo by plane, method by synthesizing ground multiview reference picture by map of taking photo by plane is obtained Take composograph.

The map of taking photo by plane constructed in the step S100 of the present embodiment plays the role of two: first in the follow-up process It is positioned for ground robot planning path and in robot moving process；Second is indoors during scene rebuilding Help to take photo by plane and be merged with ground image.Above-mentioned two process is required to the two dimension for establishing ground image between map of taking photo by plane To the corresponding relationship of three-dimensional point.To obtain above-mentioned corresponding points, one can effectively solve scheme be direct matching take photo by plane with Ground image.However, it is very difficult for directly carrying out matching to it since difference is huge on visual angle for both images.? This, the present embodiment is solved the above problems by way of synthesizing ground multiview reference picture by map of taking photo by plane.Reference picture is through such as Lower two steps are synthesized: the image synthesis that virtual camera pose is calculated and cut based on figure.

Step S201 calculates virtual camera pose based on map of taking photo by plane.

Virtual camera pose for reference picture synthesis is calculated based on the ground level of indoor scene, in the present embodiment Take photo by plane map ground level pass through be based on random sampling consistency (RANSAC, random sample consensus) plane Detection method is detected (see Fig. 2).Virtual camera pose is carried out in two steps calculating, calculates direction behind first calculating position.

Step S2011, virtual camera positions calculate.

It seeks the two-dimentional bounding box of ground level and is divided into square grid, the size of grid determines virtual camera Quantity.To reach balance in positioning accuracy and efficiency, grid side length is set as 1m in the present embodiment.For each grid, when When the ratio that ground level area therein accounts for the grid gross area is greater than 50%, it is believed that the grid is the effective grid for placing virtual camera Lattice.Virtual camera positions are set as effective grid center of a lattice and have height to be the elevation offset of h (see Fig. 2).The value of h is by ground The height of camera determines that its value is set as 1m in the present embodiment.

Step S2012, virtual camera direction design.

After obtaining virtual camera positions, to realize that the omnidirection to scene is observed, need in each virtualphase seat in the plane The multiple optical centers of placement location are identical, towards different virtual cameras.In the present embodiment, due to the camera being mounted on ground robot Optical axis be approximately parallel to ground level, only generate virtual camera horizontally toward herein.In addition, to eliminate ground and composograph Between perspective projection distortion, need for the visual field (intrinsic parameter) of virtual camera to be set as close with ground camera.In the present embodiment In, 6 virtual cameras are placed on each virtual camera positions, the yaw angle angle between virtual camera is 60 °.

In addition, the path for ground robot movement will also be planned by the ground level of inspection.Due to this implementation Example not focuses on the optimal path of planning ground robot, and the skeleton for the ground level that will test herein is used as robot path, The skeleton is extracted by Medial-Axis Transformation method (see Fig. 2).

Step S202 cuts algorithm by figure, obtains the composograph that map of taking photo by plane obtains ground multiview reference picture.

The present embodiment carries out image synthesis by means of the continuous grid in space, as shown in figure 3, f is a three-dimensional space in figure Between dough sheet, in aerial camera C_aWith virtual ground camera C_vTwo-dimensional projection's triangle on camera is denoted as t respectively_aWith t_v, figure As the principle of synthesis is by t_aT is faded to by f_v.Specifically, the visible grid with virtual camera of each taking photo by plane first is obtained.So Afterwards, for each virtual camera, two-dimentional triangle sets will be formed on its visible Grid Projection to the camera.Carrying out virtual graph When as synthesis, for two-dimentional triangle specific for one in virtual image, need to adopt based on the determination of following three factor Which converted with width Aerial Images to fill this region: (1) visibility, for this corresponding three-dimensional space of two dimension triangle The Aerial Images of dough sheet, selection should have preferable visual angle and closer sighting distance；(2) clarity, due to from interior take photo by plane video take out Some clarity is poor in the image that frame obtains, and needs to choose enough clearly Aerial Images wherein；(3) consistency, The consistency that adjacent triangle should be synthesized by same width Aerial Images as far as possible to keep composograph in virtual image. In the present embodiment, it is seen that sexual factor measures (being the bigger the better) by projected area of the space dough sheet on Aerial Images, and clear Degree factor measures (the smaller the better) by the intermediate value of Aerial Images local feature scale, is specifically shown in Fig. 4, the left side two is classified as two width offices The maximum image of portion's characteristic dimension intermediate value, the right two are classified as the two the smallest images of width local feature scale intermediate value, the second behavior The enlarged drawing of a line rectangular area.Based on foregoing description, it is excellent that the image composition problem in the present embodiment can be attributed to multi-tag Change problem, definition is as shown in formula (1):

Wherein, E (l) is the energy function during figure is cut；It is projected for the visible three-dimensional space grid of virtual camera The two-dimentional triangle sets arrived, t_iFor i-th of triangle therein；To project obtained two-dimentional triangle sets intermediate cam shape Public line set；l_iFor t_iLabel, i.e. Aerial Images serial number.As corresponding t_iSpace dough sheet in l_iIn a Aerial Images It can be seen that when, data itemWhereinFor l_iThe scale intermediate value of local feature in a Aerial Images and For corresponding t_iSpace dough sheet in l_iProjected area in a Aerial Images；Otherwise D_i(l_i)=α, wherein α be one compared with Big constant (α=10 in the present embodiment⁴) to punish such case.Work as l_j=l_jWhen, smooth item V_i(l_i,l_j)=0；Otherwise V_i (l_i,l_j)=1.The optimization problem for being defined in formula (1) can be cut algorithm by figure and carry out Efficient Solution.

For the influence for illustrating clarity factor Yu consistent sexual factor, in the present embodiment wherein one under four kinds of different configurations Image synthesis is carried out on a virtual camera, as a result as shown in figure 5, being from left to right respectively as follows: neither consideration clarity factor, again Consistent sexual factor is not considered；Only consider consistent sexual factor；Only consider clarity factor；Both clarity factor had been considered, it is further contemplated that one Cause the image composite result of sexual factor.The big rectangle in every width figure upper right corner is the Fang great Tu for scheming medium and small rectangle.It is by 5 figures it is found that clear Clear degree factor makes composograph apparent and consistent sexual factor makes composograph Hole and sharp edge less.In addition, Fig. 6 Give the ground image under other some image composite results and similar visual angle.Although still some synthesis being difficult to avoid that Error situation, the corresponding ground image of composograph have biggish similitude in public visibility region, this demonstrates this The validity of image composition method in embodiment.Composograph in this step will act as the reference data of ground robot positioning Library.

Step S300, the ground multi-view image acquired by ground camera obtain ground multi-view image set.

When ground robot is placed in indoor scene, robot will be moved along planning path and automatic collection ground Multi-view video.If robot only passes through its built-in sensors, such as wheel sub-encoders and Inertial Measurement Unit (IMU, Inertial measurement unit) positioned if, it will not be moved in strict accordance with the path of planning.This be because There are problems that accumulated error for robot built-in sensor, this problem is for the low cost that is mounted in consumer level robot It is particularly evident for sensor.Therefore, the pose of robot needs to be modified by way of vision positioning, and in this step In by matching synthesis with ground image realize vision positioning.

Step S301, ground robot are based on planning path, are regarded by the ground camera continuous acquisition ground being arranged thereon Angle video.

In this step, localization method includes initial machine people positioning, localization for Mobile Robot.

(1) initial machine people positions

Initial machine people positioning, method are as follows: the first frame for obtaining ground camera acquisition video obtains robot described The initial position taken photo by plane in map, and using the position as the starting point of robot subsequent motion.

It is positioned by the first frame for acquiring video to ground camera, available robot is first in map of taking photo by plane Beginning position, and using the position as the starting point of robot subsequent motion.Above-mentioned initial alignment can by matching first frame image with All composographs are realized by the k most like composographs that semantic tree is retrieved.It is used in this step Method based on image retrieval, and k=30.It should be noted that although having synthesized ground multi-view image, ground image and synthesis Image illumination, in terms of still have larger difference, common Scale invariant features transform (SIFT, scale-invariant Feature transform) feature is not enough to cope with.Using in this step is ASIFT (affine-SIFT) feature.

To verify the validity of this step images synthetic method and comparing the performance of SIFT feature and ASIFT feature It SIFT feature and ASIFT feature is respectively adopted compared with, the present embodiment has carried out synthesis and match and take photo by plane with ground image and ground Images match.Wherein, ground image is also that frame method is taken out by the adaptive video in step S100 based on bag of words from ground It extracts and obtains in the video of face robot acquisition.When carrying out images match, by retrieval different number with Current terrestrial figure As closest composograph and Aerial Images, find when the matching points after fundamental matrix is verified are still greater than 16, it can To think that this is matched to image.(x-axis is retrieval amount of images to images match result in figure, and y-axis is matching as shown in Figure 7 The logarithm of image log amount).As shown in Figure 7, the matching logarithm that synthesis is matched with ground image is carried out using ASIFT to distinguish It is about taken photo by plane using ASIFT and is matched with ground image, synthesis is carried out using SIFT and matches and uses with ground image SIFT takes photo by plane with ground image matched 6 times, 8 times with 19 times.

The two-dimentional match point between first frame ground image and the composograph of retrieval is provided, can be projected by light Mode obtains corresponding three-dimensional space point taking photo by plane on map.So can using based on perspective n point (PnP, Perspective-n-point method) realizes the positioning of first frame ground image.Specifically, 2 d-to-3 d pair is given It should put and be solved by RANSAC using different PnP algorithms with ground camera intrinsic parameter, camera pose.The PnP of use is calculated Method includes P3P, AP3P and EPnP.When the corresponding interior points of above-mentioned algorithm have at least one more than 16, it is believed that this time Pose is estimated as primary successfully estimation, and the pose of the camera is set to most that of interior quantity in PnP result.? During the RANSAC of the present embodiment, 500 random samplings have been carried out altogether, and distance threshold is set as 4px.

(2) localization for Mobile Robot

It, can be rough fixed to it by wheel odometer when ground robot is moved indoors and acquires video in scene Position.In this step, by matching ground and composograph ground robot overall situation formula positioned on map of taking photo by plane to repair Positive robot coarse localization result.Only to the ground video frame of extraction, rather than all videos frame carries out pose amendment.This be because Are as follows: (1) ground robot moves relatively slowly indoors, within a short period of time will not substantial deviation planning path；(2) every time into The positioning of row overall Vision needs time-consuming about 0.5s, and the time is mainly consumed in ASIFT feature extraction.It should be noted that pair In the video frame of certain extractions, due to the interior lazy weight for PnP, vision positioning can not succeed always.

Assuming that the position for the ground image that last success positions and direction are denoted as c respectively_AWith n_A, and it is current to be positioned The rough position and direction that ground image is obtained by wheel odometer are denoted as c respectively_BWith n_B.Here, being based on coarse localization knot Fruit, and the method for being not based on image retrieval searches the candidate matches composograph of Current terrestrial image.The schematic diagram of this method is such as Shown in Fig. 8, c in figure_AWith n_APosition and direction for the ground image of last success positioning, c_BWith n_BFor current surface map The rough position and direction of picture, the circle in figure indicate seeking scope, center of the circle c_B, radius r_B, figure intermediate cam shape expression void Quasi- camera pose, grayish triangle indicate the unchecked composite diagram of triangle expression of the composograph chosen and Dark grey Picture.When composograph meets two following conditions, will match to it with Current terrestrial image: (1) composograph is located at The center of circle is c_B, radius r_BCircle in, wherein r_B=max (‖ c_B-c_A‖, β) and β=2m；(2) composograph direction and n_BAngle Less than 90 °.A variable radius r is used herein_BThe reason of be movement with robot, obtained by robot built-in sensor The drift of the relative pose taken can be increasingly severe.To Current terrestrial image and the progress of obtained candidate matches composograph With later, Current terrestrial image uses the method in similar initial machine people positioning, real by the method for the RANSAC based on PnP Now position.If positioning result (position deviation in the present embodiment sufficiently small with coarse localization result error in position and orientation Less than 5m, towards deviation less than 30 °), then Current terrestrial framing success.The pose for being confirmed as robot, which has passed through, currently to be determined The successful ground image complete modification in position, and the pose in wheel odometer is reset to the positioning result for being currently based on vision. The successful ground image of no-fix will relocate in subsequent indoor scene reconstruction process in this step.

Step S302 takes out frame side using the adaptive video based on bag of words to the ground multi-view video of indoor scene Method abstract image frame obtains the ground multi-view image set of indoor scene.

This step takes out frame method by the adaptive video in step S100 based on bag of words, to acquired indoor field The ground multi-view video of scape carries out the extraction of picture frame, obtains the ground multi-view image set of indoor scene, since method is consistent, Details are not described herein again.

After robot localization and video acquisition, and the not all frame extracted from ground video has been successfully located to take photo by plane Map.However, to obtain complete indoor scene reconstructed results, need to position and merge all by (take photo by plane and ground) video Extract obtained image.Here, first proposed a kind of process of ground image that batch type positioning positions not successfully before.So Afterwards, point is connected into the primitive character locus of points in ground being matched with composograph, and passes through bundle adjustment (BA, bundle Adjustment it) realizes to take photo by plane and be merged with ground point cloud.Finally, passing through, fusion is taken photo by plane and ground image is complete, thick to obtain Close indoor scene reconstructed results.

Step S401 obtains in ground multi-view image set the corresponding ground camera of every piece image in the map of taking photo by plane In position.

For the ground image positioned not successfully in positioning step S301, the invention proposes a kind of positioning of batch type camera to flow Journey.In each camera positioning cycle, more cameras are positioned as far as possible.2 d-to-3 d corresponding points for camera positioning herein In three-dimensional space point and not only include the spatial point rebuild during SfM, further include by light project and take photo by plane Map (three-dimensional grid) intersects obtained spatial point.Include three steps in each batch type camera positioning cycle: (1) camera is fixed Position, the extension of (2) scene and bundle adjustment (BA, bundle adjustment), (3) camera filter, and flow chart is as shown in Figure 9. Before carrying out the positioning of batch type camera, first the image that pumping frame obtains from ground video match and connect match point At feature point trajectory.For at least two width succeeded positioning visual picture feature point trajectory, pass through the side of triangulation Formula seeks its space coordinate.

Step S4011, camera positioning.

There are two types of modes to obtain two-dimentional three-dimensional corresponding points to position the successful ground image of current no-fix: (1) taking photo by plane ground Figure, for the two dimensional character point in the ground image that currently positions not successfully, it is available its in the image that success positions Match point.Then the camera photocentre positioned from success is to these match point projection radiations, the friendship of the ray of projection and map of taking photo by plane Point is the corresponding three-dimensional space point of two dimensional character point in the ground image currently positioned not successfully.(2) terrain surface specifications point rail Mark provides the terrain surface specifications locus of points currently obtained by triangulation, can pass through the matching knot between previous ground image Fruit obtains corresponding two dimensional character point in the ground image currently positioned not successfully.The successful ground camera of current no-fix can benefit Positioning is realized by the method for the RANSAC based on PnP with above two two dimension three-dimensional corresponding points, and positioning result uses two knots That in fruit more than interior points.The method of camera positioning will be realized here by the three-dimensional corresponding points of two kinds of two dimensions and only with wherein The method of any one compares, and the results are shown in Figure 10, and (1) is shown in figure and is based on take photo by plane map and terrain surface specifications point Track, (2) are based only upon map of taking photo by plane, (3) are based only upon the batch type camera positioning result of the terrain surface specifications locus of points, and x-axis is in figure Batch type camera positioning cycle number, y-axis are the camera quantity of successfully positioning；As x=0, corresponding y value is in step S300 The camera quantity that middle success positions.As shown in Figure 10, by several iterative cycles, three kinds of methods can position same amount of phase Machine.However, iterative cycles number needed for realizing the method for camera positioning by the three-dimensional corresponding points of two kinds of two dimensions in the present embodiment At least (it only needs 5 times, and other two methods are respectively necessary for 6 times and 8 times).

Step S4012, scene extension and BA.

After camera positioning, triangulation is carried out to the terrain surface specifications locus of points to realize scene according to the camera of new definition Extension.For the precision for improving camera pose and scene point, to oriented ground camera pose and triangle after triangulation The spatial position for measuring the obtained terrain surface specifications locus of points is optimized by BA.

Step S4013, camera filtering.

In view of the robustness of method, the operation of step camera filtering joined to the successful camera of positioning after BA.If The successful camera of new definition in current iteration circulation, position or direction and its coarse localization result (wheel after BA optimizes Odometer obtain positioning result) deviation it is larger (position deviation be greater than 5m or towards deviation be greater than 30 °) if, determine this positioning As a result unreliable and filtered out.It should be noted that the camera filtered out in current iteration circulation changes subsequent in the step It still can successfully be positioned in generation circulation.

Above three step iteration carries out, until all cameras successfully position or there is no cameras successfully to determine Position.The process of batch type camera positioning is as shown in figure 11, and pyramid indicates to position successful camera pose in figure.0th iteration table Show the camera positioning result in step S300.

Ground multi-view image and composograph match point are connected into original take photo by plane and the terrain surface specifications locus of points by step S402 In, generate the constraint across view.

To take photo by plane and ground point cloud by BA fusion, need to introduce the constraint taken photo by plane between ground image.Here, above-mentioned Across view constraint can be by by being taken photo by plane in step S300 by what the images match point that matching ground is obtained with composograph generated It is provided with the terrain surface specifications locus of points.Matched ground image characteristic point more can be easily connected into primitively by inquiring its serial number In the region feature locus of points.But although composograph is generated by Aerial Images, it is desirable to which matched composograph characteristic point to be connected into It is but less easy in original feature point trajectory of taking photo by plane.This is because for being with the matched composograph characteristic point of ground image It extracts and obtains again on the composite image.In this step, by ground and synthesis in such a way that light is projected and projected with point Images match point is expanded to airplane view, and the schematic diagram of the process is as shown in figure 12, C in figure_i(i=1,2,3) is aerial camera, X_j(j=1,2,3) is the spatial point corresponding to matched composograph characteristic point, t_ijFor point X_jIn camera C_iOn projection, t_1j- t_2j-t_2jIt (j=1,2) is j-th of feature point trajectory across airplane view.Specifically, the mode for first passing through light projection is being navigated It claps and obtains the corresponding spatial point of matched composograph characteristic point on map, the spatial point that then will acquire is projected to its visible boat It claps and is taken photo by plane and the terrain surface specifications locus of points on image with production.

Step S403 is optimized to map is taken photo by plane with ground multi-view image point cloud by BA.

In this step, using the library Ceres, by way of minimizing back projection's error to connection generate take photo by plane and ground Feature point trajectory, original (take photo by plane and ground) feature point trajectory, the inside and outside parameter of all (take photo by plane and ground) cameras carry out complete Office's optimization.

Step S404, using taking photo by plane for being optimized by step S403 and ground camera pose, fusion is taken photo by plane and ground Image carries out dense reconstruction, obtains the dense model of indoor scene.

Due to being introduced in optimization process across the constraint taken photo by plane with ground view, and boat is merged in dense reconstruction process It claps and ground image, the model rebuild is more more complete than the model only obtained with the image reconstruction of single source, accurate.

It takes photo by plane for the fusion to the embodiment of the present invention and is verified with the scene modeling method of ground multi-view image, below It is taken photo by plane and the experimental facilities of ground metadata and collected two groups of indoor scene data based on acquisition as shown in fig. 13 that Collection, is tested the present embodiment method on this two group data set.

1, data set

Due at present almost without taking photo by plane and ground image public data collection for indoor scene is directed to, in this this test certainly Row acquires two groups of indoor scene data sets for method assessment.Specifically, it is carried out using the mini aircraft of DJI Spark Visual angle scene of taking photo by plane acquisition carries out the acquisition of ground visual angle scene, first number using the GoPro HERO4 being mounted on TurtleBot It is as shown in figure 13 according to acquisition equipment, from left to right the TurtleBot respectively on ground, aerial DJISpark, on desktop DJI Spark.Take photo by plane and the form of ground metadata of acquisition are that resolution ratio is 1080p, and frame per second is the video of 25FPS.It adopts Two indoor scene data sets of collection, are called Room and Hall respectively.Some information such as tables 1 about Room and Hall data set It is shown.The three-dimensional of example Aerial Images and generation in Room and Hall data set takes photo by plane map respectively as shown in Fig. 2 and Figure 14. By Fig. 2 and Figure 14 it is found that the map of taking photo by plane of Hall data set relative to the Map quality of taking photo by plane of Room data set worse and scale It is bigger.However, by subsequent method test and assess content it is found that the method for the present invention can be obtained on above-mentioned two data set it is expected As a result, this illustrates that the method for the present invention has preferable robustness and expansibility.

Table 1

Data set	Room	Hall
			Take photo by plane video length/s	218	494
Ground video length/s	61	113
			Area coverage/m²	30	130

In addition, the virtual camera pose calculating on Room and Hall data set is opened up respectively with robot path planning's result It is shown in the rightmost side of Fig. 2 and Figure 14.As shown, by the method for the invention, being calculated and robot road for virtual camera pose The ground level of diameter planning can be detected successfully, and the virtual camera and robot path that generate more equably cover indoor field Scape.Virtual camera pose calculation method through the invention generates 60 and 384 void on Room and Hall data set respectively Quasi- camera.Map area that first three in Figure 14 is classified as example Aerial Images and its corresponding three-dimensional is taken photo by plane.4th is classified as entire three-dimensional It takes photo by plane map.5th is classified as in the take photo by plane robot path planning on map and virtual camera pose calculated result, wherein Horizon Face is designated as light gray, and planning path is designated as line segment, and virtual camera pose is indicated by pyramid.

2, frame result is adaptively taken out

Adaptive pumping frame method in through the invention is extracted from taking photo by plane for Room data set with ground video respectively 271 and 112 frame images, taking photo by plane and extracted 721 and 250 frame images in ground video from Hall data set.To verify this hair The bright middle validity for taking out frame method carries out the method for the present invention and pumping frame method at equal intervals on the video of taking photo by plane of Hall data set Comparative experiments.It uses adaptive pumping frame method of the invention in length for 494s, extracts and obtain on the video that frame per second is 25FPS 721 frame images extract 1 frame image (494 × 25/721 ≈ 17) every 17 frames, amount to and extract for taking out frame method at equal intervals 730 frame images.Then, the video frame that two kinds of different pumping frame methods obtain is subjected to camera mark by open source SfM system COLMAP Fixed, as a result as shown in figure 15, left figure: the COLMAP of the video frame of adaptive decimation is as a result, whereinView Frequency frame is successfully demarcated；Middle figure and right figure: the COLMAP of the video frame of extracted at equal intervals is as a result, wherein Video frame successfully demarcate, but be broken as two parts.Middle figure and right figure respectively correspond two rectangular areas in left figure.Left figure The comparing result in same corner is illustrated with the circle circled portion in right figure.As shown in Figure 15, due to compared to equally spaced Method, the video frame connectivity that the method for the present invention extracts is more preferable, therefore by rebuilding to it, can get and consistent takes photo by plane ground Figure.In addition, the black circle in Figure 15 shows to obtain map of more completely taking photo by plane, need to carry out more video in corner Intensive pumping frame operation.

3, ground camera positioning result

To verify batch type camera positioning of the invention and taking photo by plane and ground image fusion method, at this to batch type phase Machine positions compared with taking photo by plane and having carried out qualitative and quantitative with the fused camera positioning result of ground image and COLMAP result. It should be noted that ground camera pose is not initialized for COLMAP, only marked by image itself Fixed, i.e., the camera positioning result in step S300 by map of taking photo by plane is not supplied to COLMAP as priori.

Qualitative comparing result is as shown in figure 16, the first row: Room data set result；Second row: Hall data set result；From It is left-to-right: to take photo by plane and the result after the fused result of ground image, the positioning of ground camera batch type, COLMAP calibration result； Rectangle has indicated the camera pose of mistake in figure.As shown in Figure 16, it for Room data set, is obtained by three kinds of control methods Camera pose it is more similar, this is because the scene structure of Room data set is relatively simple.And for Hall data set, There is manifest error in the camera track being calculated by COLMAP in the left-hand component of scene.This is because repeat texture with Weak texture causes the matching result between ground image to include more matching exterior point, in this case will lead to increment type SfM system System generates more apparent scene drift phenomenon.In contrast, for the positioning of batch type camera, due to part ground image Primary Location is to map of taking photo by plane, and there is only some more slight scenes drift situations for positioning result.Also, above-mentioned mistake Camera posture it is subsequent take photo by plane with ground image fusing stage amendment come.This is because, complete in image co-registration Taking photo by plane and the terrain surface specifications locus of points for connection generation is introduced in office's optimization.The above results show to take photo by plane and ground by fusion Image positions compared to only using more robust for ground image ground camera.

4, indoor scene reconstructed results

Finally, having carried out qualitative and quantitative assessment to the indoor scene algorithm for reconstructing in the present invention.This test and comparison Indoor reconstructed results in the present invention with only with taking photo by plane or ground image is rebuild as a result, qualitative comparison result such as Figure 17 It is shown, first row: Room data set result；Secondary series: the enlarged drawing of rectangular area in first row；Third column: Hall data set As a result；4th column: the enlarged drawing of rectangular area in third column.From top to bottom: only using ground image, only use Aerial Images, utilize The result of fusion taken photo by plane with ground image.It is to be noted that (1) for the indoor algorithm for reconstructing in the present invention, use Camera pose is that fusion is taken photo by plane and the camera pose after ground image；(2) for the method only with ground image, use Camera pose is the camera pose after the positioning of batch type camera；(3) for the method only with aerial camera, use Camera pose is the camera pose estimated by SfM.As shown in Figure 17, although being deposited due to blocking with weak texture situation Partial region is still inevitably being lacked in reconstructed results, is rebuild, is passed through relative to only with a kind of individually image Fusion is taken photo by plane more complete with the indoor reconstructed results of ground image.

The scene modeling system that a kind of fusion of second embodiment of the invention is taken photo by plane with ground multi-view image, the system include Taking photo by plane, map structuring module, composograph obtain module, multi-view image set obtains module, indoor scene model obtains module；

The indoor scene model obtains module, is configured to the composograph, by it is described take photo by plane multi-view image with The ground multi-view image is merged, and indoor scene model is obtained., including

Person of ordinary skill in the field can be understood that, for convenience and simplicity of description, foregoing description The specific work process of system and related explanation, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.

It should be noted that provided by the above embodiment merge the scene modeling system taken photo by plane with ground multi-view image, only The example of the division of the above functional modules, in practical applications, it can according to need and by above-mentioned function distribution Completed by different functional modules, i.e., by the embodiment of the present invention module or step decompose or combine again, for example, on The module for stating embodiment can be merged into a module, multiple submodule can also be further split into, to complete above description All or part of function.For module involved in the embodiment of the present invention, the title of step, it is only for distinguish each Module or step, are not intended as inappropriate limitation of the present invention.

A kind of storage device of third embodiment of the invention, wherein being stored with a plurality of program, described program is suitable for by handling Device is loaded and is executed to realize scene modeling method that above-mentioned fusion is taken photo by plane with ground multi-view image.

A kind of processing unit of fourth embodiment of the invention, including processor, storage device；Processor is adapted for carrying out each Program；Storage device is suitable for storing a plurality of program；Described program is suitable for being loaded by processor and being executed to realize above-mentioned melt Close the scene modeling method taken photo by plane with ground multi-view image.

Person of ordinary skill in the field can be understood that, for convenience and simplicity of description, foregoing description The specific work process and related explanation of storage device, processing unit, can refer to corresponding processes in the foregoing method embodiment, Details are not described herein.

Those skilled in the art should be able to recognize that, mould described in conjunction with the examples disclosed in the embodiments of the present disclosure Block, method and step, can be realized with electronic hardware, computer software, or a combination of the two, software module, method and step pair The program answered can be placed in random access memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electric erasable and can compile Any other form of storage well known in journey ROM, register, hard disk, moveable magnetic disc, CD-ROM or technical field is situated between In matter.In order to clearly demonstrate the interchangeability of electronic hardware and software, in the above description according to function generally Describe each exemplary composition and step.These functions are executed actually with electronic hardware or software mode, depend on technology The specific application and design constraint of scheme.Those skilled in the art can carry out using distinct methods each specific application Realize described function, but such implementation should not be considered as beyond the scope of the present invention.

Term " first ", " second " etc. are to be used to distinguish similar objects, rather than be used to describe or indicate specific suitable Sequence or precedence.

Term " includes " or any other like term are intended to cover non-exclusive inclusion, so that including a system Process, method, article or equipment/device of column element not only includes those elements, but also including being not explicitly listed Other elements, or further include the intrinsic element of these process, method, article or equipment/devices.

So far, it has been combined preferred embodiment shown in the drawings and describes technical solution of the present invention, still, this field Technical staff is it is easily understood that protection scope of the present invention is expressly not limited to these specific embodiments.Without departing from this Under the premise of the principle of invention, those skilled in the art can make equivalent change or replacement to the relevant technologies feature, these Technical solution after change or replacement will fall within the scope of protection of the present invention.

Claims

1. a kind of merge the scene modeling method taken photo by plane with ground multi-view image, which comprises the following steps:

Step S200, based on the map of taking photo by plane, method by synthesizing ground multiview reference picture by map of taking photo by plane is obtained and is closed At image；

Step S400 is based on the composograph, the multi-view image of taking photo by plane is merged with the ground multi-view image, is obtained Take indoor scene model.

2. according to claim 1 merge the scene modeling method taken photo by plane with ground multi-view image, which is characterized in that step In S100 " multi-view image of taking photo by plane of indoor scene to be modeled is obtained, and constructs map of taking photo by plane ", method are as follows:

To the multi-view video of taking photo by plane of indoor scene, frame method abstract image frame is taken out using the adaptive video based on bag of words, Obtain the multi-view image set of taking photo by plane of indoor scene；

3. according to claim 1 merge the scene modeling method taken photo by plane with ground multi-view image, which is characterized in that step " method that ground multiview reference picture is synthesized by map of taking photo by plane ", method in S200 are as follows:

Based on map of taking photo by plane, virtual camera pose is calculated；

Algorithm is cut by figure, obtains the composograph that map of taking photo by plane obtains ground multiview reference picture.

4. according to claim 3 merge the scene modeling method taken photo by plane with ground multi-view image, which is characterized in that " logical Cross figure and cut algorithm, obtain the composograph that map of taking photo by plane obtains ground multiview reference picture ", method are as follows:

Wherein, E (l) is the energy function during figure is cut；It is projected for the visible three-dimensional space grid of virtual camera Two-dimentional triangle sets, t_iFor i-th of triangle therein；For the public affairs for projecting obtained two-dimentional triangle sets intermediate cam shape Line set altogether；l_iFor t_iAerial Images serial number；D_i(l_i) it is data item；V_i(l_i,l_j) it is smooth item；

As corresponding t_iSpace dough sheet in l_iWhen visible in a Aerial Images, data item Otherwise D_i(l_i)=α, whereinFor l_iThe scale intermediate value of local feature in a Aerial Images,For corresponding t_iSpace dough sheet exist L_iProjected area in a Aerial Images, α are a biggish constant；

Work as l_i=l_jWhen, smooth item V_i(l_i,l_j)=0；Otherwise V_i(l_i,l_j)=1.

5. according to claim 1 merge the scene modeling method taken photo by plane with ground multi-view image, which is characterized in that step " the ground multi-view image acquired by ground camera obtains ground multi-view image set ", method in S300 are as follows:

To the ground multi-view video of indoor scene, frame method abstract image frame is taken out using the adaptive video based on bag of words, Obtain the ground multi-view image set of indoor scene.

6. according to claim 5 merge the scene modeling method taken photo by plane with ground multi-view image, which is characterized in that " Face robot is based on planning path, passes through the ground camera continuous acquisition ground multi-view video being arranged thereon " during, positioning Method includes initial machine people positioning, localization for Mobile Robot；

The initial machine people positioning, method are as follows: the first frame for obtaining ground camera acquisition video obtains robot described The initial position taken photo by plane in map, and using the position as the starting point of robot subsequent motion；

The localization for Mobile Robot, method are as follows: machine is carried out based on each moment running data of initial position and robot People position coarse positioning, by matching current time acquired video frame images and the composograph, when acquisition robot is current It is engraved in the position in the map of taking photo by plane, and with the location information of position revision coarse positioning.

7. according to claim 1 merge the scene modeling method taken photo by plane with ground multi-view image, which is characterized in that step S400 " is based on the composograph, the multi-view image of taking photo by plane is merged with the ground multi-view image, obtain indoor field Scape model ", method are as follows:

By ground multi-view image and composograph match point be connected into it is original take photo by plane with the terrain surface specifications locus of points, generate across view Constraint；

It is optimized to map is taken photo by plane with ground multi-view image point cloud by bundle adjustment；

Using map and the dense reconstruction of ground multi-view image progress of taking photo by plane, the dense model of indoor scene is obtained.

8. a kind of merge the scene modeling system taken photo by plane with ground multi-view image, which is characterized in that the system includes map of taking photo by plane Construct module, composograph obtains module, multi-view image set obtains module, indoor scene model obtains module；

The map structuring module of taking photo by plane is configured to obtain the multi-view image of taking photo by plane of indoor scene to be modeled, and constructs and take photo by plane Map；

The composograph obtains module, is configured to the map of taking photo by plane, by synthesizing ground visual angle ginseng by map of taking photo by plane The method for examining image obtains composograph；

The multi-view image set obtains module, is configured to the ground multi-view image acquired by ground camera, obtains ground view Angle image collection；

The indoor scene model obtains module, is configured to the composograph, by it is described take photo by plane multi-view image with it is described Ground multi-view image is merged, and indoor scene model is obtained.

9. a kind of storage device, wherein being stored with a plurality of program, which is characterized in that described program is suitable for being loaded and being held by processor Row merges the scene modeling method taken photo by plane with ground multi-view image to realize that claim 1-7 is described in any item.

10. a kind of processing unit, including processor, storage device；Processor is adapted for carrying out each program；Storage device is suitable for Store a plurality of program；It is characterized in that, described program is suitable for being loaded by processor and being executed to realize any one of claim 1-7 The scene modeling method that the fusion is taken photo by plane with ground multi-view image.