CN110223380A - Fusion is taken photo by plane and the scene modeling method of ground multi-view image, system, device - Google Patents
Fusion is taken photo by plane and the scene modeling method of ground multi-view image, system, device Download PDFInfo
- Publication number
- CN110223380A CN110223380A CN201910502762.4A CN201910502762A CN110223380A CN 110223380 A CN110223380 A CN 110223380A CN 201910502762 A CN201910502762 A CN 201910502762A CN 110223380 A CN110223380 A CN 110223380A
- Authority
- CN
- China
- Prior art keywords
- plane
- ground
- view image
- photo
- map
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 162
- 230000004927 fusion Effects 0.000 title claims abstract description 24
- 230000002194 synthesizing effect Effects 0.000 claims abstract description 9
- 230000004807 localization Effects 0.000 claims description 15
- 230000000007 visual effect Effects 0.000 claims description 14
- 230000003044 adaptive effect Effects 0.000 claims description 13
- 238000004422 calculation algorithm Methods 0.000 claims description 12
- 238000003860 storage Methods 0.000 claims description 11
- 230000033001 locomotion Effects 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 5
- 241000208340 Araliaceae Species 0.000 claims 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 claims 1
- 235000003140 Panax quinquefolius Nutrition 0.000 claims 1
- 235000008434 ginseng Nutrition 0.000 claims 1
- 230000008569 process Effects 0.000 description 29
- 238000010586 diagram Methods 0.000 description 16
- 230000015572 biosynthetic process Effects 0.000 description 15
- 238000003786 synthesis reaction Methods 0.000 description 15
- 238000012360 testing method Methods 0.000 description 9
- 238000000605 extraction Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 8
- 239000000284 extract Substances 0.000 description 7
- 239000002131 composite material Substances 0.000 description 6
- 239000000203 mixture Substances 0.000 description 6
- 238000005086 pumping Methods 0.000 description 6
- 230000001568 sexual effect Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 238000005457 optimization Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 235000013399 edible fruits Nutrition 0.000 description 3
- 230000000903 blocking effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 239000011800 void material Substances 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 239000004020 conductor Substances 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 239000003381 stabilizer Substances 0.000 description 1
- 238000010189 synthetic method Methods 0.000 description 1
- 230000026676 system process Effects 0.000 description 1
- 238000011426 transformation method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
- G06T7/33—Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Software Systems (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to scene modeling fields, it takes photo by plane and the scene modeling method of ground multi-view image, system, device more particularly to a kind of fusion, it is intended that solve that structure is complicated, texture shortage for indoor scene, modeling result based on image is imperfect, inaccurate fusion.The method of the present invention includes: S100, obtains the multi-view image of taking photo by plane of indoor scene to be modeled, and construct map of taking photo by plane;S200, based on the map of taking photo by plane, method by synthesizing ground multiview reference picture by map of taking photo by plane obtains composograph;S300, the ground multi-view image acquired by ground camera obtain ground multi-view image set;S400 is based on the composograph, the multi-view image of taking photo by plane is merged with the ground multi-view image, obtains indoor scene model.Complete, accurate indoor scene model can be generated in the present invention, takes into account collecting efficiency and reconstruction precision, and have stronger robustness.
Description
Technical field
The invention belongs to scene modeling fields, and in particular to a kind of to merge the scene modeling side to take photo by plane with ground multi-view image
Method, system, device.
Background technique
Indoor scene three-dimensional reconstruction plays important function, such as indoor navigation, service-delivery machine in many practical applications
People, Building Information Model (BIM, building information modeling) etc..Existing indoor scene method for reconstructing can
Be roughly divided into three classes: (1) method for being based on laser radar (LiDAR, light detection and ranging), (2) are based on
The method of RGB-D camera, the method for (3) based on image.
It is biggish rebuilding although the method based on LiDAR and the method based on RGB-D camera have higher precision
When indoor scene, there is the problems such as higher cost, expansion is poor in above two method.For the method based on LiDAR, by
Cause scene to be blocked in the limitation of scanning visual angle to be difficult to avoid that, laser scanning and the point cloud of multi-angle of view are generally required when being scanned
Alignment.For the method based on RGB-D camera, since sensor effective working distance is limited, need to acquire, a large amount of number of processing
According to.Therefore, there is high cost, inefficient deficiency when carrying out the reconstruction of extensive indoor scene in the above method.
Relative to the method based on LiDAR and the method based on RGB-D camera, although the method cost based on image is more
Low, flexibility is stronger, and there is also some shortcomings for such methods, as caused by complex scene, repetitive structure, shortage texture etc.
Imperfect, inaccurate reconstructed results.Even if current state-of-the-art structure from motion (SfM, structure from
Motion) with multiple view stereo technology (MVS, multiple view stereo) technology, larger, structure is more complex
Reconstruction effect in indoor scene is still unsatisfactory.In addition, it is some based on the method for image using some a priori assumptions come
Indoor scene Problems of Reconstruction is handled, such as the Manhattan world is assumed.Although these methods can sometimes obtain preferably
As a result, still, in the case where not meeting a priori assumption, these methods frequently can lead to the reconstructed results of mistake.
Summary of the invention
In order to solve the above problem in the prior art, in order to solve to be directed to indoor scene, structure is complicated, texture lacks,
The problem of modeling result based on image is imperfect, inaccurate fusion, first aspect present invention, propose a kind of fusion take photo by plane with
The scene modeling method of ground multi-view image, comprising the following steps:
Step S100 obtains the multi-view image of taking photo by plane of indoor scene to be modeled, and constructs map of taking photo by plane;
Step S200, based on the map of taking photo by plane, method by synthesizing ground multiview reference picture by map of taking photo by plane is obtained
Take composograph;
Step S300, the ground multi-view image acquired by ground camera obtain ground multi-view image set;
Step S400 is based on the composograph, the multi-view image of taking photo by plane is melted with the ground multi-view image
It closes, obtains indoor scene model.
In some preferred embodiments, in step S100 " multi-view image of taking photo by plane of indoor scene to be modeled is obtained, and
Construct map of taking photo by plane ", method are as follows:
To the multi-view video of taking photo by plane of indoor scene, frame method abstract image is taken out using the adaptive video based on bag of words
Frame obtains the multi-view image set of taking photo by plane of indoor scene;
Based on the multi-view image set of taking photo by plane, map of taking photo by plane is constructed by image modeling method.
In some preferred embodiments, " side of ground multiview reference picture is synthesized by map of taking photo by plane in step S200
Method ", method are as follows:
Based on map of taking photo by plane, virtual camera pose is calculated;
Algorithm is cut by figure, obtains the composograph that map of taking photo by plane obtains ground multiview reference picture;
In some preferred embodiments, " algorithm is cut by figure, map of taking photo by plane is obtained and obtains ground multiview reference picture
Composograph ", method are as follows:
Wherein, E (l) is the energy function during figure is cut;It is projected for the visible three-dimensional space grid of virtual camera
The two-dimentional triangle sets arrived, tiFor i-th of triangle therein;To project obtained two-dimentional triangle sets intermediate cam shape
Public line set;liFor tiAerial Images serial number;Di(li) it is data item;Vi(li,lj) it is smooth item;
As corresponding tiSpace dough sheet in liWhen visible in a Aerial Images, data itemOtherwise
If Di(li)=α, whereinFor liThe scale intermediate value of local feature in a Aerial Images,For corresponding tiSpace dough sheet
In liProjected area in a Aerial Images, α are a biggish constant;
Work as li=ljWhen, smooth item Vi(li,lj)=0;Otherwise Vi(lj,lj)=1.
In some preferred embodiments, " the ground multi-view image acquired by ground camera obtains ground in step S300
Face multi-view image set ", method are as follows:
Ground robot is based on planning path, passes through the ground camera continuous acquisition ground multi-view video being arranged thereon;
To the ground multi-view video of indoor scene, frame method abstract image is taken out using the adaptive video based on bag of words
Frame obtains the ground multi-view image set of indoor scene.
In some preferred embodiments, " ground robot is based on planning path, is connected by the ground camera being arranged thereon
During continuous acquisition ground multi-view video ", localization method includes initial machine people positioning, localization for Mobile Robot;
The initial machine people positioning, method are as follows: the first frame for obtaining ground camera acquisition video obtains robot and exists
Initial position in the map of taking photo by plane, and using the position as the starting point of robot subsequent motion;
The localization for Mobile Robot, method are as follows: carried out based on each moment running data of initial position and robot
Robot location's coarse positioning is obtained robot and is worked as by matching current time acquired video frame images and the composograph
Position of the preceding moment in the map of taking photo by plane, and with the location information of position revision coarse positioning.
In some preferred embodiments, step S400 " is based on the composograph, by multi-view image and the institute of taking photo by plane
State ground multi-view image to be merged, obtain indoor scene model ", method are as follows:
Obtain position of the corresponding ground camera of every piece image in the map of taking photo by plane in ground multi-view image set;
By ground multi-view image and composograph match point be connected into it is original take photo by plane in the terrain surface specifications locus of points, generate across
The constraint of view;
It is optimized to taking photo by plane with ground image pose by bundle adjustment (BA, bundle adjustment);
Dense reconstruction is carried out with ground multi-view image using taking photo by plane, obtains the dense model of indoor scene.
The second aspect of the present invention proposes a kind of scene modeling system for merging and taking photo by plane with ground multi-view image, this is
System includes take photo by plane map structuring module, composograph acquisition module, multi-view image set acquisition module, the acquisition of indoor scene model
Module;
The map structuring module of taking photo by plane is configured to obtain the multi-view image of taking photo by plane of indoor scene to be modeled, and constructs
It takes photo by plane map;
The composograph obtains module, is configured to the map of taking photo by plane, by synthesizing ground view by map of taking photo by plane
The method of angle reference picture obtains composograph;
The multi-view image set obtains module, is configured to the ground multi-view image acquired by ground camera, obtains ground
Face multi-view image set;
The indoor scene model obtains module, is configured to the composograph, by it is described take photo by plane multi-view image with
The ground multi-view image is merged, and indoor scene model is obtained.
The third aspect of the present invention proposes a kind of storage device, wherein be stored with a plurality of program, described program be suitable for by
Processor is loaded and is executed to realize scene modeling method that above-mentioned fusion is taken photo by plane with ground multi-view image.
The fourth aspect of the present invention proposes a kind of processing unit, including processor, storage device;Processor, suitable for holding
Each program of row;Storage device is suitable for storing a plurality of program;Described program is suitable for being loaded by processor and being executed above-mentioned to realize
The scene modeling method taken photo by plane with ground multi-view image of fusion.
Beneficial effects of the present invention:
The present invention advances in scene indoors by constructing three-dimensional map guided robot of taking photo by plane and acquires ground view
Then angle image is merged to taking photo by plane with ground image, and generate complete, accurate indoor scene by fused image
Model.Indoor scene of the present invention rebuilds process and takes into account collecting efficiency and reconstruction precision, also, has stronger robustness.
Detailed description of the invention
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other
Feature, objects and advantages will become more apparent upon:
Fig. 1 is that the fusion of an embodiment of the present invention is taken photo by plane and shown with the scene modeling method process frame of ground multi-view image
It is intended to;
Fig. 2 is showing for the map of taking photo by plane that a video frame extracted through 271 width is rebuild in an embodiment of the present invention
Example diagram;
Fig. 3 is the image synthesis schematic diagram in an embodiment of the present invention based on grid;
Fig. 4 is the exemplary relationship figure in an embodiment of the present invention between local feature scale and image definition;
Fig. 5 is the image composite result exemplary diagram cut under different configurations based on figure in an embodiment of the present invention;
Fig. 6 is the ground image exemplary diagram under other image composite result and similar visual angle as a comparison;
Fig. 7 is images match result exemplary diagram in an embodiment of the present invention;
Fig. 8 is that candidate matches composograph searches schematic diagram in robot kinematics in an embodiment of the present invention;
Fig. 9 is batch type camera positioning flow schematic diagram in an embodiment of the present invention;
Figure 10 is the batch type camera positioning result exemplary diagram based on three kinds of feature point trajectories in an embodiment of the present invention;
Figure 11 is batch type camera position fixing process exemplary diagram in an embodiment of the present invention;
Figure 12 is to generate schematic diagram with the terrain surface specifications locus of points for taking photo by plane for airplane view in an embodiment of the present invention;
Figure 13 is the data acquisition equipment used in the test of an embodiment of the present invention;
Figure 14 is the three-dimensional of the example Aerial Images in the test of an embodiment of the present invention in Hall data set and generation
It takes photo by plane map example figure;
Figure 15 be in the test of an embodiment of the present invention Hall data set take photo by plane on video the present invention take out frame method with
Frame method contrast and experiment exemplary diagram is taken out at equal intervals;
Figure 16 is the qualitative comparing result exemplary diagram that ground camera positions in the test of an embodiment of the present invention;
Figure 17 is that indoor scene rebuilds qualitative results exemplary diagram in the test of an embodiment of the present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to the embodiment of the present invention
In technical solution be clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, without
It is whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not before making creative work
Every other embodiment obtained is put, shall fall within the protection scope of the present invention.
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to
Convenient for description, part relevant to related invention is illustrated only in attached drawing.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase
Mutually combination.
Due to the complexity of indoor scene, scene perfect reconstruction, which need to consider following two, to be realized for the method based on image
Problem.First is image acquisition process, i.e., how to acquire image with it is complete, efficiently cover indoor scene.Second is field
Scape algorithm for reconstructing, i.e., how SfM with the image for merging different perspectives during MVS to obtain complete, accurate reconstructed results.
For above-mentioned two problem, the invention proposes a kind of novel indoor scene acquisitions based on image and reconstruction process.The process
It has used mini aircraft and ground robot and has included four key steps (as shown in Figure 1): (1) map structuring of taking photo by plane: having adopted
Multi-view image of taking photo by plane is acquired indoors with a mini aircraft, then obtains the three of characterization indoor scene by multi-view image of taking photo by plane
Hexagonal lattice, and be used for as the map of ground robot location navigation;(2) reference picture synthesize: in map of taking photo by plane into
Row plane monitoring-network obtains ground level and is used for ground robot path planning.Then, several ground views are synthesized based on map of taking photo by plane
Angle image, the positioning for ground robot;(3) ground robot positions: ground robot enters the room scene and carries out ground
The acquisition of multi-view image.While acquiring image when robot is in movement, pass through the image of matching acquisition and the ground of synthesis
Multi-view image realizes the positioning of robot;(4) indoor scene is rebuild: after ground robot completes Image Acquisition, by base
Mini aircraft image and ground robot image are merged in the modeling procedure of image, realize indoor scene it is complete with it is accurate
Modeling.
In modeling procedure of the invention, manual operation, subsequent surface map are only needed in Aerial Images collection process
As acquisition and indoor scene modeling process are full-automatic realization, it means that process expansion of the invention is strong, is suitable for
The acquisition and reconstruction of extensive indoor scene.The acquisition of Aerial Images can also be by independent navigation according to the guidance path of acquisition
Automatic collection Aerial Images, but which increase the complexity of algorithm, therefore preferentially select manual operation, to guarantee to obtain image
Flexibility and integrality and expansibility.
Compared to the ground image of ground robot acquisition, the Aerial Images of mini aircraft acquisition possess better visual angle
Bigger visual field, it means that relative to ground image, blocking in Aerial Images can be smaller with error hiding problem.Therefore,
It can be more reliably used in subsequent ground robot position fixing process by the map that Aerial Images generate.
The Aerial Images of mini aircraft shooting are complementary to one another with the ground image that ground robot is shot and can be complete
Whole covering indoor scene.Therefore, it is taken photo by plane and ground image by fusion, available more complete, accurate indoor scene mould
Type.
The scene modeling method that a kind of fusion of the invention is taken photo by plane with ground multi-view image, comprising the following steps:
Step S100 obtains the multi-view image of taking photo by plane of indoor scene to be modeled, and constructs map of taking photo by plane;
Step S200, based on the map of taking photo by plane, method by synthesizing ground multiview reference picture by map of taking photo by plane is obtained
Take composograph;
Step S300, the ground multi-view image acquired by ground camera obtain ground multi-view image set;
Step S400 is based on the composograph, the multi-view image of taking photo by plane is melted with the ground multi-view image
It closes, obtains indoor scene model.
It is illustrated to more clearly take photo by plane to present invention fusion with the scene modeling method of ground multi-view image, below
Expansion detailed description is carried out to each step in a kind of embodiment of our inventive method in conjunction with attached drawing.
The scene modeling method that the fusion of the invention of an embodiment of the present invention is taken photo by plane with ground multi-view image, including step
Rapid S100-S400.
Step S100 obtains the multi-view image of taking photo by plane of indoor scene to be modeled, and constructs map of taking photo by plane.
First using the video of taking photo by plane of mini aircraft acquisition indoor scene, and some images are extracted from video.Then
It is rebuild to obtain model of taking photo by plane by image of the process based on image modeling to extraction, and is determined with as ground robot
The three-dimensional map of position.
Step S101 takes out frame side using the adaptive video based on bag of words to the multi-view video of taking photo by plane of indoor scene
Method abstract image frame obtains the multi-view image set of taking photo by plane of indoor scene.
By acquiring top-down multi-view video of taking photo by plane in mini aircraft indoors scene in the present embodiment, acquisition
Video resolution is 1080p, frame per second 25FPS.Since mini unmanned plane size is small, flexibility ratio is high, is very suitable for indoor field
Scape shooting.For example, the mini aircraft used in the present embodiment is the DJI Spark that is mounted with stabilizer Yu 4K camera,
Its weight is only 300g.In addition, carrying out shooting relative to ground visual angle to indoor scene from visual angle of taking photo by plane and being not easily susceptible to scene screening
The influence of gear, therefore can more efficient, complete covering scene using mini aircraft.
The video of taking photo by plane of acquisition is provided, positioning simultaneously and composition (SLAM, simultaneous can be passed through
Localization and mapping) system building takes photo by plane map.However, in the present embodiment, using offline SfM technology
Carry out map structuring of taking photo by plane.This is because: (1) takes photo by plane in embodiment, map is positioned for ground robot, therefore need not
It is constructed online;(2) compared with the SLAM for being easy to produce scene drift phenomenon, SfM is more applicable for large scale scene modeling.
But if using SfM take photo by plane map structuring when, it is clear that do not need to use all frames in video of taking photo by plane.Because taking photo by plane
Contain a large amount of redundancy in video frame, this can seriously reduce the efficiency of SfM map structuring.To solve the above problems, one
Direct method is exactly to extract a frame at interval of fixed frame number in video, then carries out map structure with the video frame extracted
It builds.However, there are still some disadvantages for this way: (1) being difficult by being realized in the mini aircraft of manual operation indoors scene
Stablize, the video acquisition of constant speed, and this problem can become more difficult in course line corner;(2) due in indoor scene
Texture-rich degree is inconsistent, therefore it is also inappropriate for carrying out uniform fold to scene.Taking photo by plane ground for solution is above-mentioned
The problem of figure building process, use in the present embodiment it is a kind of based on bag of words (BoW, bag of words) model from
Adaptive video takes out frame method, and details are as follows for process:
In BoW model, piece image can be expressed as a normalized vector vi, and a pair of of image similarity can pass through
The dot product of corresponding vectorIt indicates.As it is known by one skilled in the art, similarity excessively high between adjacent image can introduce
Excessive redundancy, and then reduce composition efficiency;And similarity too low between adjacent image then will lead between image and connect
Property is poor, and composition is imperfect.Therefore, the side of an adaptive decimation subset from all video frames is proposed in the present embodiment
Method, when taking out frame, this method limits the similarity between the video frame of the video frame extraction adjacent thereto of each extraction at one
In suitable range.Specifically, the normalized vector v that the library libvot generates each frame is first passed throughi, and using first frame as rise
Initial point.During taking out frame, it is assumed that current i-th frame has been extracted, and obtains the score of the similarity between the frame and its subsequent frame:
{si,j| j=i+1, i+2 ... }, whereinThen, willIt is compared with preset similarity threshold t, this
T=0.1 in embodiment;Assuming thatFor { si,jIn first meet such as lower inequality: si,j< t, then jth*- 1 frame (i.e. first
A previous frame for meeting above-mentioned inequality) be next extraction video frame.Above process iteration carries out, until having verified all
Video frame.
Step S102 constructs map of taking photo by plane by image modeling method based on the multi-view image set of taking photo by plane.
Based on the multi-view image set of taking photo by plane that step S101 is obtained, constructed by standard set based on image modeling process
It takes photo by plane map, which includes: (1) SfM, (2) MVS, (3) resurfacing.In addition, since indoor reception is less than GPS signal, it can
To zoom to its actual physical size by ground control point (GCP, the ground control point) map that will take photo by plane.Fig. 2
The map example of taking photo by plane rebuild of video frame extracted for one through 271 width, first three in figure be classified as example Aerial Images and its
Corresponding three-dimensional is taken photo by plane map area, and the 4th, which is classified as entire three-dimensional, takes photo by plane map, and the 5th is classified as in the robot on map that takes photo by plane
Path planning and virtual camera pose calculated result, wherein ground level is designated as light gray, and planning path is designated as figure middle conductor, virtually
Camera pose is indicated by pyramid.
Step S200, based on the map of taking photo by plane, method by synthesizing ground multiview reference picture by map of taking photo by plane is obtained
Take composograph.
The map of taking photo by plane constructed in the step S100 of the present embodiment plays the role of two: first in the follow-up process
It is positioned for ground robot planning path and in robot moving process;Second is indoors during scene rebuilding
Help to take photo by plane and be merged with ground image.Above-mentioned two process is required to the two dimension for establishing ground image between map of taking photo by plane
To the corresponding relationship of three-dimensional point.To obtain above-mentioned corresponding points, one can effectively solve scheme be direct matching take photo by plane with
Ground image.However, it is very difficult for directly carrying out matching to it since difference is huge on visual angle for both images.?
This, the present embodiment is solved the above problems by way of synthesizing ground multiview reference picture by map of taking photo by plane.Reference picture is through such as
Lower two steps are synthesized: the image synthesis that virtual camera pose is calculated and cut based on figure.
Step S201 calculates virtual camera pose based on map of taking photo by plane.
Virtual camera pose for reference picture synthesis is calculated based on the ground level of indoor scene, in the present embodiment
Take photo by plane map ground level pass through be based on random sampling consistency (RANSAC, random sample consensus) plane
Detection method is detected (see Fig. 2).Virtual camera pose is carried out in two steps calculating, calculates direction behind first calculating position.
Step S2011, virtual camera positions calculate.
It seeks the two-dimentional bounding box of ground level and is divided into square grid, the size of grid determines virtual camera
Quantity.To reach balance in positioning accuracy and efficiency, grid side length is set as 1m in the present embodiment.For each grid, when
When the ratio that ground level area therein accounts for the grid gross area is greater than 50%, it is believed that the grid is the effective grid for placing virtual camera
Lattice.Virtual camera positions are set as effective grid center of a lattice and have height to be the elevation offset of h (see Fig. 2).The value of h is by ground
The height of camera determines that its value is set as 1m in the present embodiment.
Step S2012, virtual camera direction design.
After obtaining virtual camera positions, to realize that the omnidirection to scene is observed, need in each virtualphase seat in the plane
The multiple optical centers of placement location are identical, towards different virtual cameras.In the present embodiment, due to the camera being mounted on ground robot
Optical axis be approximately parallel to ground level, only generate virtual camera horizontally toward herein.In addition, to eliminate ground and composograph
Between perspective projection distortion, need for the visual field (intrinsic parameter) of virtual camera to be set as close with ground camera.In the present embodiment
In, 6 virtual cameras are placed on each virtual camera positions, the yaw angle angle between virtual camera is 60 °.
In addition, the path for ground robot movement will also be planned by the ground level of inspection.Due to this implementation
Example not focuses on the optimal path of planning ground robot, and the skeleton for the ground level that will test herein is used as robot path,
The skeleton is extracted by Medial-Axis Transformation method (see Fig. 2).
Step S202 cuts algorithm by figure, obtains the composograph that map of taking photo by plane obtains ground multiview reference picture.
The present embodiment carries out image synthesis by means of the continuous grid in space, as shown in figure 3, f is a three-dimensional space in figure
Between dough sheet, in aerial camera CaWith virtual ground camera CvTwo-dimensional projection's triangle on camera is denoted as t respectivelyaWith tv, figure
As the principle of synthesis is by taT is faded to by fv.Specifically, the visible grid with virtual camera of each taking photo by plane first is obtained.So
Afterwards, for each virtual camera, two-dimentional triangle sets will be formed on its visible Grid Projection to the camera.Carrying out virtual graph
When as synthesis, for two-dimentional triangle specific for one in virtual image, need to adopt based on the determination of following three factor
Which converted with width Aerial Images to fill this region: (1) visibility, for this corresponding three-dimensional space of two dimension triangle
The Aerial Images of dough sheet, selection should have preferable visual angle and closer sighting distance;(2) clarity, due to from interior take photo by plane video take out
Some clarity is poor in the image that frame obtains, and needs to choose enough clearly Aerial Images wherein;(3) consistency,
The consistency that adjacent triangle should be synthesized by same width Aerial Images as far as possible to keep composograph in virtual image.
In the present embodiment, it is seen that sexual factor measures (being the bigger the better) by projected area of the space dough sheet on Aerial Images, and clear
Degree factor measures (the smaller the better) by the intermediate value of Aerial Images local feature scale, is specifically shown in Fig. 4, the left side two is classified as two width offices
The maximum image of portion's characteristic dimension intermediate value, the right two are classified as the two the smallest images of width local feature scale intermediate value, the second behavior
The enlarged drawing of a line rectangular area.Based on foregoing description, it is excellent that the image composition problem in the present embodiment can be attributed to multi-tag
Change problem, definition is as shown in formula (1):
Wherein, E (l) is the energy function during figure is cut;It is projected for the visible three-dimensional space grid of virtual camera
The two-dimentional triangle sets arrived, tiFor i-th of triangle therein;To project obtained two-dimentional triangle sets intermediate cam shape
Public line set;liFor tiLabel, i.e. Aerial Images serial number.As corresponding tiSpace dough sheet in liIn a Aerial Images
It can be seen that when, data itemWhereinFor liThe scale intermediate value of local feature in a Aerial Images and
For corresponding tiSpace dough sheet in liProjected area in a Aerial Images;Otherwise Di(li)=α, wherein α be one compared with
Big constant (α=10 in the present embodiment4) to punish such case.Work as lj=ljWhen, smooth item Vi(li,lj)=0;Otherwise Vi
(li,lj)=1.The optimization problem for being defined in formula (1) can be cut algorithm by figure and carry out Efficient Solution.
For the influence for illustrating clarity factor Yu consistent sexual factor, in the present embodiment wherein one under four kinds of different configurations
Image synthesis is carried out on a virtual camera, as a result as shown in figure 5, being from left to right respectively as follows: neither consideration clarity factor, again
Consistent sexual factor is not considered;Only consider consistent sexual factor;Only consider clarity factor;Both clarity factor had been considered, it is further contemplated that one
Cause the image composite result of sexual factor.The big rectangle in every width figure upper right corner is the Fang great Tu for scheming medium and small rectangle.It is by 5 figures it is found that clear
Clear degree factor makes composograph apparent and consistent sexual factor makes composograph Hole and sharp edge less.In addition, Fig. 6
Give the ground image under other some image composite results and similar visual angle.Although still some synthesis being difficult to avoid that
Error situation, the corresponding ground image of composograph have biggish similitude in public visibility region, this demonstrates this
The validity of image composition method in embodiment.Composograph in this step will act as the reference data of ground robot positioning
Library.
Step S300, the ground multi-view image acquired by ground camera obtain ground multi-view image set.
When ground robot is placed in indoor scene, robot will be moved along planning path and automatic collection ground
Multi-view video.If robot only passes through its built-in sensors, such as wheel sub-encoders and Inertial Measurement Unit (IMU,
Inertial measurement unit) positioned if, it will not be moved in strict accordance with the path of planning.This be because
There are problems that accumulated error for robot built-in sensor, this problem is for the low cost that is mounted in consumer level robot
It is particularly evident for sensor.Therefore, the pose of robot needs to be modified by way of vision positioning, and in this step
In by matching synthesis with ground image realize vision positioning.
Step S301, ground robot are based on planning path, are regarded by the ground camera continuous acquisition ground being arranged thereon
Angle video.
In this step, localization method includes initial machine people positioning, localization for Mobile Robot.
(1) initial machine people positions
Initial machine people positioning, method are as follows: the first frame for obtaining ground camera acquisition video obtains robot described
The initial position taken photo by plane in map, and using the position as the starting point of robot subsequent motion.
It is positioned by the first frame for acquiring video to ground camera, available robot is first in map of taking photo by plane
Beginning position, and using the position as the starting point of robot subsequent motion.Above-mentioned initial alignment can by matching first frame image with
All composographs are realized by the k most like composographs that semantic tree is retrieved.It is used in this step
Method based on image retrieval, and k=30.It should be noted that although having synthesized ground multi-view image, ground image and synthesis
Image illumination, in terms of still have larger difference, common Scale invariant features transform (SIFT, scale-invariant
Feature transform) feature is not enough to cope with.Using in this step is ASIFT (affine-SIFT) feature.
To verify the validity of this step images synthetic method and comparing the performance of SIFT feature and ASIFT feature
It SIFT feature and ASIFT feature is respectively adopted compared with, the present embodiment has carried out synthesis and match and take photo by plane with ground image and ground
Images match.Wherein, ground image is also that frame method is taken out by the adaptive video in step S100 based on bag of words from ground
It extracts and obtains in the video of face robot acquisition.When carrying out images match, by retrieval different number with Current terrestrial figure
As closest composograph and Aerial Images, find when the matching points after fundamental matrix is verified are still greater than 16, it can
To think that this is matched to image.(x-axis is retrieval amount of images to images match result in figure, and y-axis is matching as shown in Figure 7
The logarithm of image log amount).As shown in Figure 7, the matching logarithm that synthesis is matched with ground image is carried out using ASIFT to distinguish
It is about taken photo by plane using ASIFT and is matched with ground image, synthesis is carried out using SIFT and matches and uses with ground image
SIFT takes photo by plane with ground image matched 6 times, 8 times with 19 times.
The two-dimentional match point between first frame ground image and the composograph of retrieval is provided, can be projected by light
Mode obtains corresponding three-dimensional space point taking photo by plane on map.So can using based on perspective n point (PnP,
Perspective-n-point method) realizes the positioning of first frame ground image.Specifically, 2 d-to-3 d pair is given
It should put and be solved by RANSAC using different PnP algorithms with ground camera intrinsic parameter, camera pose.The PnP of use is calculated
Method includes P3P, AP3P and EPnP.When the corresponding interior points of above-mentioned algorithm have at least one more than 16, it is believed that this time
Pose is estimated as primary successfully estimation, and the pose of the camera is set to most that of interior quantity in PnP result.?
During the RANSAC of the present embodiment, 500 random samplings have been carried out altogether, and distance threshold is set as 4px.
(2) localization for Mobile Robot
The localization for Mobile Robot, method are as follows: carried out based on each moment running data of initial position and robot
Robot location's coarse positioning is obtained robot and is worked as by matching current time acquired video frame images and the composograph
Position of the preceding moment in the map of taking photo by plane, and with the location information of position revision coarse positioning.
It, can be rough fixed to it by wheel odometer when ground robot is moved indoors and acquires video in scene
Position.In this step, by matching ground and composograph ground robot overall situation formula positioned on map of taking photo by plane to repair
Positive robot coarse localization result.Only to the ground video frame of extraction, rather than all videos frame carries out pose amendment.This be because
Are as follows: (1) ground robot moves relatively slowly indoors, within a short period of time will not substantial deviation planning path;(2) every time into
The positioning of row overall Vision needs time-consuming about 0.5s, and the time is mainly consumed in ASIFT feature extraction.It should be noted that pair
In the video frame of certain extractions, due to the interior lazy weight for PnP, vision positioning can not succeed always.
Assuming that the position for the ground image that last success positions and direction are denoted as c respectivelyAWith nA, and it is current to be positioned
The rough position and direction that ground image is obtained by wheel odometer are denoted as c respectivelyBWith nB.Here, being based on coarse localization knot
Fruit, and the method for being not based on image retrieval searches the candidate matches composograph of Current terrestrial image.The schematic diagram of this method is such as
Shown in Fig. 8, c in figureAWith nAPosition and direction for the ground image of last success positioning, cBWith nBFor current surface map
The rough position and direction of picture, the circle in figure indicate seeking scope, center of the circle cB, radius rB, figure intermediate cam shape expression void
Quasi- camera pose, grayish triangle indicate the unchecked composite diagram of triangle expression of the composograph chosen and Dark grey
Picture.When composograph meets two following conditions, will match to it with Current terrestrial image: (1) composograph is located at
The center of circle is cB, radius rBCircle in, wherein rB=max (‖ cB-cA‖, β) and β=2m;(2) composograph direction and nBAngle
Less than 90 °.A variable radius r is used hereinBThe reason of be movement with robot, obtained by robot built-in sensor
The drift of the relative pose taken can be increasingly severe.To Current terrestrial image and the progress of obtained candidate matches composograph
With later, Current terrestrial image uses the method in similar initial machine people positioning, real by the method for the RANSAC based on PnP
Now position.If positioning result (position deviation in the present embodiment sufficiently small with coarse localization result error in position and orientation
Less than 5m, towards deviation less than 30 °), then Current terrestrial framing success.The pose for being confirmed as robot, which has passed through, currently to be determined
The successful ground image complete modification in position, and the pose in wheel odometer is reset to the positioning result for being currently based on vision.
The successful ground image of no-fix will relocate in subsequent indoor scene reconstruction process in this step.
Step S302 takes out frame side using the adaptive video based on bag of words to the ground multi-view video of indoor scene
Method abstract image frame obtains the ground multi-view image set of indoor scene.
This step takes out frame method by the adaptive video in step S100 based on bag of words, to acquired indoor field
The ground multi-view video of scape carries out the extraction of picture frame, obtains the ground multi-view image set of indoor scene, since method is consistent,
Details are not described herein again.
Step S400 is based on the composograph, the multi-view image of taking photo by plane is melted with the ground multi-view image
It closes, obtains indoor scene model.
After robot localization and video acquisition, and the not all frame extracted from ground video has been successfully located to take photo by plane
Map.However, to obtain complete indoor scene reconstructed results, need to position and merge all by (take photo by plane and ground) video
Extract obtained image.Here, first proposed a kind of process of ground image that batch type positioning positions not successfully before.So
Afterwards, point is connected into the primitive character locus of points in ground being matched with composograph, and passes through bundle adjustment (BA, bundle
Adjustment it) realizes to take photo by plane and be merged with ground point cloud.Finally, passing through, fusion is taken photo by plane and ground image is complete, thick to obtain
Close indoor scene reconstructed results.
Step S401 obtains in ground multi-view image set the corresponding ground camera of every piece image in the map of taking photo by plane
In position.
For the ground image positioned not successfully in positioning step S301, the invention proposes a kind of positioning of batch type camera to flow
Journey.In each camera positioning cycle, more cameras are positioned as far as possible.2 d-to-3 d corresponding points for camera positioning herein
In three-dimensional space point and not only include the spatial point rebuild during SfM, further include by light project and take photo by plane
Map (three-dimensional grid) intersects obtained spatial point.Include three steps in each batch type camera positioning cycle: (1) camera is fixed
Position, the extension of (2) scene and bundle adjustment (BA, bundle adjustment), (3) camera filter, and flow chart is as shown in Figure 9.
Before carrying out the positioning of batch type camera, first the image that pumping frame obtains from ground video match and connect match point
At feature point trajectory.For at least two width succeeded positioning visual picture feature point trajectory, pass through the side of triangulation
Formula seeks its space coordinate.
Step S4011, camera positioning.
There are two types of modes to obtain two-dimentional three-dimensional corresponding points to position the successful ground image of current no-fix: (1) taking photo by plane ground
Figure, for the two dimensional character point in the ground image that currently positions not successfully, it is available its in the image that success positions
Match point.Then the camera photocentre positioned from success is to these match point projection radiations, the friendship of the ray of projection and map of taking photo by plane
Point is the corresponding three-dimensional space point of two dimensional character point in the ground image currently positioned not successfully.(2) terrain surface specifications point rail
Mark provides the terrain surface specifications locus of points currently obtained by triangulation, can pass through the matching knot between previous ground image
Fruit obtains corresponding two dimensional character point in the ground image currently positioned not successfully.The successful ground camera of current no-fix can benefit
Positioning is realized by the method for the RANSAC based on PnP with above two two dimension three-dimensional corresponding points, and positioning result uses two knots
That in fruit more than interior points.The method of camera positioning will be realized here by the three-dimensional corresponding points of two kinds of two dimensions and only with wherein
The method of any one compares, and the results are shown in Figure 10, and (1) is shown in figure and is based on take photo by plane map and terrain surface specifications point
Track, (2) are based only upon map of taking photo by plane, (3) are based only upon the batch type camera positioning result of the terrain surface specifications locus of points, and x-axis is in figure
Batch type camera positioning cycle number, y-axis are the camera quantity of successfully positioning;As x=0, corresponding y value is in step S300
The camera quantity that middle success positions.As shown in Figure 10, by several iterative cycles, three kinds of methods can position same amount of phase
Machine.However, iterative cycles number needed for realizing the method for camera positioning by the three-dimensional corresponding points of two kinds of two dimensions in the present embodiment
At least (it only needs 5 times, and other two methods are respectively necessary for 6 times and 8 times).
Step S4012, scene extension and BA.
After camera positioning, triangulation is carried out to the terrain surface specifications locus of points to realize scene according to the camera of new definition
Extension.For the precision for improving camera pose and scene point, to oriented ground camera pose and triangle after triangulation
The spatial position for measuring the obtained terrain surface specifications locus of points is optimized by BA.
Step S4013, camera filtering.
In view of the robustness of method, the operation of step camera filtering joined to the successful camera of positioning after BA.If
The successful camera of new definition in current iteration circulation, position or direction and its coarse localization result (wheel after BA optimizes
Odometer obtain positioning result) deviation it is larger (position deviation be greater than 5m or towards deviation be greater than 30 °) if, determine this positioning
As a result unreliable and filtered out.It should be noted that the camera filtered out in current iteration circulation changes subsequent in the step
It still can successfully be positioned in generation circulation.
Above three step iteration carries out, until all cameras successfully position or there is no cameras successfully to determine
Position.The process of batch type camera positioning is as shown in figure 11, and pyramid indicates to position successful camera pose in figure.0th iteration table
Show the camera positioning result in step S300.
Ground multi-view image and composograph match point are connected into original take photo by plane and the terrain surface specifications locus of points by step S402
In, generate the constraint across view.
To take photo by plane and ground point cloud by BA fusion, need to introduce the constraint taken photo by plane between ground image.Here, above-mentioned
Across view constraint can be by by being taken photo by plane in step S300 by what the images match point that matching ground is obtained with composograph generated
It is provided with the terrain surface specifications locus of points.Matched ground image characteristic point more can be easily connected into primitively by inquiring its serial number
In the region feature locus of points.But although composograph is generated by Aerial Images, it is desirable to which matched composograph characteristic point to be connected into
It is but less easy in original feature point trajectory of taking photo by plane.This is because for being with the matched composograph characteristic point of ground image
It extracts and obtains again on the composite image.In this step, by ground and synthesis in such a way that light is projected and projected with point
Images match point is expanded to airplane view, and the schematic diagram of the process is as shown in figure 12, C in figurei(i=1,2,3) is aerial camera,
Xj(j=1,2,3) is the spatial point corresponding to matched composograph characteristic point, tijFor point XjIn camera CiOn projection, t1j-
t2j-t2jIt (j=1,2) is j-th of feature point trajectory across airplane view.Specifically, the mode for first passing through light projection is being navigated
It claps and obtains the corresponding spatial point of matched composograph characteristic point on map, the spatial point that then will acquire is projected to its visible boat
It claps and is taken photo by plane and the terrain surface specifications locus of points on image with production.
Step S403 is optimized to map is taken photo by plane with ground multi-view image point cloud by BA.
In this step, using the library Ceres, by way of minimizing back projection's error to connection generate take photo by plane and ground
Feature point trajectory, original (take photo by plane and ground) feature point trajectory, the inside and outside parameter of all (take photo by plane and ground) cameras carry out complete
Office's optimization.
Step S404, using taking photo by plane for being optimized by step S403 and ground camera pose, fusion is taken photo by plane and ground
Image carries out dense reconstruction, obtains the dense model of indoor scene.
Due to being introduced in optimization process across the constraint taken photo by plane with ground view, and boat is merged in dense reconstruction process
It claps and ground image, the model rebuild is more more complete than the model only obtained with the image reconstruction of single source, accurate.
It takes photo by plane for the fusion to the embodiment of the present invention and is verified with the scene modeling method of ground multi-view image, below
It is taken photo by plane and the experimental facilities of ground metadata and collected two groups of indoor scene data based on acquisition as shown in fig. 13 that
Collection, is tested the present embodiment method on this two group data set.
1, data set
Due at present almost without taking photo by plane and ground image public data collection for indoor scene is directed to, in this this test certainly
Row acquires two groups of indoor scene data sets for method assessment.Specifically, it is carried out using the mini aircraft of DJI Spark
Visual angle scene of taking photo by plane acquisition carries out the acquisition of ground visual angle scene, first number using the GoPro HERO4 being mounted on TurtleBot
It is as shown in figure 13 according to acquisition equipment, from left to right the TurtleBot respectively on ground, aerial DJISpark, on desktop
DJI Spark.Take photo by plane and the form of ground metadata of acquisition are that resolution ratio is 1080p, and frame per second is the video of 25FPS.It adopts
Two indoor scene data sets of collection, are called Room and Hall respectively.Some information such as tables 1 about Room and Hall data set
It is shown.The three-dimensional of example Aerial Images and generation in Room and Hall data set takes photo by plane map respectively as shown in Fig. 2 and Figure 14.
By Fig. 2 and Figure 14 it is found that the map of taking photo by plane of Hall data set relative to the Map quality of taking photo by plane of Room data set worse and scale
It is bigger.However, by subsequent method test and assess content it is found that the method for the present invention can be obtained on above-mentioned two data set it is expected
As a result, this illustrates that the method for the present invention has preferable robustness and expansibility.
Table 1
Data set | Room | Hall |
Take photo by plane video length/s | 218 | 494 |
Ground video length/s | 61 | 113 |
Area coverage/m2 | 30 | 130 |
In addition, the virtual camera pose calculating on Room and Hall data set is opened up respectively with robot path planning's result
It is shown in the rightmost side of Fig. 2 and Figure 14.As shown, by the method for the invention, being calculated and robot road for virtual camera pose
The ground level of diameter planning can be detected successfully, and the virtual camera and robot path that generate more equably cover indoor field
Scape.Virtual camera pose calculation method through the invention generates 60 and 384 void on Room and Hall data set respectively
Quasi- camera.Map area that first three in Figure 14 is classified as example Aerial Images and its corresponding three-dimensional is taken photo by plane.4th is classified as entire three-dimensional
It takes photo by plane map.5th is classified as in the take photo by plane robot path planning on map and virtual camera pose calculated result, wherein Horizon
Face is designated as light gray, and planning path is designated as line segment, and virtual camera pose is indicated by pyramid.
2, frame result is adaptively taken out
Adaptive pumping frame method in through the invention is extracted from taking photo by plane for Room data set with ground video respectively
271 and 112 frame images, taking photo by plane and extracted 721 and 250 frame images in ground video from Hall data set.To verify this hair
The bright middle validity for taking out frame method carries out the method for the present invention and pumping frame method at equal intervals on the video of taking photo by plane of Hall data set
Comparative experiments.It uses adaptive pumping frame method of the invention in length for 494s, extracts and obtain on the video that frame per second is 25FPS
721 frame images extract 1 frame image (494 × 25/721 ≈ 17) every 17 frames, amount to and extract for taking out frame method at equal intervals
730 frame images.Then, the video frame that two kinds of different pumping frame methods obtain is subjected to camera mark by open source SfM system COLMAP
Fixed, as a result as shown in figure 15, left figure: the COLMAP of the video frame of adaptive decimation is as a result, whereinView
Frequency frame is successfully demarcated;Middle figure and right figure: the COLMAP of the video frame of extracted at equal intervals is as a result, wherein
Video frame successfully demarcate, but be broken as two parts.Middle figure and right figure respectively correspond two rectangular areas in left figure.Left figure
The comparing result in same corner is illustrated with the circle circled portion in right figure.As shown in Figure 15, due to compared to equally spaced
Method, the video frame connectivity that the method for the present invention extracts is more preferable, therefore by rebuilding to it, can get and consistent takes photo by plane ground
Figure.In addition, the black circle in Figure 15 shows to obtain map of more completely taking photo by plane, need to carry out more video in corner
Intensive pumping frame operation.
3, ground camera positioning result
To verify batch type camera positioning of the invention and taking photo by plane and ground image fusion method, at this to batch type phase
Machine positions compared with taking photo by plane and having carried out qualitative and quantitative with the fused camera positioning result of ground image and COLMAP result.
It should be noted that ground camera pose is not initialized for COLMAP, only marked by image itself
Fixed, i.e., the camera positioning result in step S300 by map of taking photo by plane is not supplied to COLMAP as priori.
Qualitative comparing result is as shown in figure 16, the first row: Room data set result;Second row: Hall data set result;From
It is left-to-right: to take photo by plane and the result after the fused result of ground image, the positioning of ground camera batch type, COLMAP calibration result;
Rectangle has indicated the camera pose of mistake in figure.As shown in Figure 16, it for Room data set, is obtained by three kinds of control methods
Camera pose it is more similar, this is because the scene structure of Room data set is relatively simple.And for Hall data set,
There is manifest error in the camera track being calculated by COLMAP in the left-hand component of scene.This is because repeat texture with
Weak texture causes the matching result between ground image to include more matching exterior point, in this case will lead to increment type SfM system
System generates more apparent scene drift phenomenon.In contrast, for the positioning of batch type camera, due to part ground image
Primary Location is to map of taking photo by plane, and there is only some more slight scenes drift situations for positioning result.Also, above-mentioned mistake
Camera posture it is subsequent take photo by plane with ground image fusing stage amendment come.This is because, complete in image co-registration
Taking photo by plane and the terrain surface specifications locus of points for connection generation is introduced in office's optimization.The above results show to take photo by plane and ground by fusion
Image positions compared to only using more robust for ground image ground camera.
4, indoor scene reconstructed results
Finally, having carried out qualitative and quantitative assessment to the indoor scene algorithm for reconstructing in the present invention.This test and comparison
Indoor reconstructed results in the present invention with only with taking photo by plane or ground image is rebuild as a result, qualitative comparison result such as Figure 17
It is shown, first row: Room data set result;Secondary series: the enlarged drawing of rectangular area in first row;Third column: Hall data set
As a result;4th column: the enlarged drawing of rectangular area in third column.From top to bottom: only using ground image, only use Aerial Images, utilize
The result of fusion taken photo by plane with ground image.It is to be noted that (1) for the indoor algorithm for reconstructing in the present invention, use
Camera pose is that fusion is taken photo by plane and the camera pose after ground image;(2) for the method only with ground image, use
Camera pose is the camera pose after the positioning of batch type camera;(3) for the method only with aerial camera, use
Camera pose is the camera pose estimated by SfM.As shown in Figure 17, although being deposited due to blocking with weak texture situation
Partial region is still inevitably being lacked in reconstructed results, is rebuild, is passed through relative to only with a kind of individually image
Fusion is taken photo by plane more complete with the indoor reconstructed results of ground image.
The scene modeling system that a kind of fusion of second embodiment of the invention is taken photo by plane with ground multi-view image, the system include
Taking photo by plane, map structuring module, composograph obtain module, multi-view image set obtains module, indoor scene model obtains module;
The map structuring module of taking photo by plane is configured to obtain the multi-view image of taking photo by plane of indoor scene to be modeled, and constructs
It takes photo by plane map;
The composograph obtains module, is configured to the map of taking photo by plane, by synthesizing ground view by map of taking photo by plane
The method of angle reference picture obtains composograph;
The multi-view image set obtains module, is configured to the ground multi-view image acquired by ground camera, obtains ground
Face multi-view image set;
The indoor scene model obtains module, is configured to the composograph, by it is described take photo by plane multi-view image with
The ground multi-view image is merged, and indoor scene model is obtained., including
Person of ordinary skill in the field can be understood that, for convenience and simplicity of description, foregoing description
The specific work process of system and related explanation, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
It should be noted that provided by the above embodiment merge the scene modeling system taken photo by plane with ground multi-view image, only
The example of the division of the above functional modules, in practical applications, it can according to need and by above-mentioned function distribution
Completed by different functional modules, i.e., by the embodiment of the present invention module or step decompose or combine again, for example, on
The module for stating embodiment can be merged into a module, multiple submodule can also be further split into, to complete above description
All or part of function.For module involved in the embodiment of the present invention, the title of step, it is only for distinguish each
Module or step, are not intended as inappropriate limitation of the present invention.
A kind of storage device of third embodiment of the invention, wherein being stored with a plurality of program, described program is suitable for by handling
Device is loaded and is executed to realize scene modeling method that above-mentioned fusion is taken photo by plane with ground multi-view image.
A kind of processing unit of fourth embodiment of the invention, including processor, storage device;Processor is adapted for carrying out each
Program;Storage device is suitable for storing a plurality of program;Described program is suitable for being loaded by processor and being executed to realize above-mentioned melt
Close the scene modeling method taken photo by plane with ground multi-view image.
Person of ordinary skill in the field can be understood that, for convenience and simplicity of description, foregoing description
The specific work process and related explanation of storage device, processing unit, can refer to corresponding processes in the foregoing method embodiment,
Details are not described herein.
Those skilled in the art should be able to recognize that, mould described in conjunction with the examples disclosed in the embodiments of the present disclosure
Block, method and step, can be realized with electronic hardware, computer software, or a combination of the two, software module, method and step pair
The program answered can be placed in random access memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electric erasable and can compile
Any other form of storage well known in journey ROM, register, hard disk, moveable magnetic disc, CD-ROM or technical field is situated between
In matter.In order to clearly demonstrate the interchangeability of electronic hardware and software, in the above description according to function generally
Describe each exemplary composition and step.These functions are executed actually with electronic hardware or software mode, depend on technology
The specific application and design constraint of scheme.Those skilled in the art can carry out using distinct methods each specific application
Realize described function, but such implementation should not be considered as beyond the scope of the present invention.
Term " first ", " second " etc. are to be used to distinguish similar objects, rather than be used to describe or indicate specific suitable
Sequence or precedence.
Term " includes " or any other like term are intended to cover non-exclusive inclusion, so that including a system
Process, method, article or equipment/device of column element not only includes those elements, but also including being not explicitly listed
Other elements, or further include the intrinsic element of these process, method, article or equipment/devices.
So far, it has been combined preferred embodiment shown in the drawings and describes technical solution of the present invention, still, this field
Technical staff is it is easily understood that protection scope of the present invention is expressly not limited to these specific embodiments.Without departing from this
Under the premise of the principle of invention, those skilled in the art can make equivalent change or replacement to the relevant technologies feature, these
Technical solution after change or replacement will fall within the scope of protection of the present invention.
Claims (10)
1. a kind of merge the scene modeling method taken photo by plane with ground multi-view image, which comprises the following steps:
Step S100 obtains the multi-view image of taking photo by plane of indoor scene to be modeled, and constructs map of taking photo by plane;
Step S200, based on the map of taking photo by plane, method by synthesizing ground multiview reference picture by map of taking photo by plane is obtained and is closed
At image;
Step S300, the ground multi-view image acquired by ground camera obtain ground multi-view image set;
Step S400 is based on the composograph, the multi-view image of taking photo by plane is merged with the ground multi-view image, is obtained
Take indoor scene model.
2. according to claim 1 merge the scene modeling method taken photo by plane with ground multi-view image, which is characterized in that step
In S100 " multi-view image of taking photo by plane of indoor scene to be modeled is obtained, and constructs map of taking photo by plane ", method are as follows:
To the multi-view video of taking photo by plane of indoor scene, frame method abstract image frame is taken out using the adaptive video based on bag of words,
Obtain the multi-view image set of taking photo by plane of indoor scene;
Based on the multi-view image set of taking photo by plane, map of taking photo by plane is constructed by image modeling method.
3. according to claim 1 merge the scene modeling method taken photo by plane with ground multi-view image, which is characterized in that step
" method that ground multiview reference picture is synthesized by map of taking photo by plane ", method in S200 are as follows:
Based on map of taking photo by plane, virtual camera pose is calculated;
Algorithm is cut by figure, obtains the composograph that map of taking photo by plane obtains ground multiview reference picture.
4. according to claim 3 merge the scene modeling method taken photo by plane with ground multi-view image, which is characterized in that " logical
Cross figure and cut algorithm, obtain the composograph that map of taking photo by plane obtains ground multiview reference picture ", method are as follows:
Wherein, E (l) is the energy function during figure is cut;It is projected for the visible three-dimensional space grid of virtual camera
Two-dimentional triangle sets, tiFor i-th of triangle therein;For the public affairs for projecting obtained two-dimentional triangle sets intermediate cam shape
Line set altogether;liFor tiAerial Images serial number;Di(li) it is data item;Vi(li,lj) it is smooth item;
As corresponding tiSpace dough sheet in liWhen visible in a Aerial Images, data item Otherwise
Di(li)=α, whereinFor liThe scale intermediate value of local feature in a Aerial Images,For corresponding tiSpace dough sheet exist
LiProjected area in a Aerial Images, α are a biggish constant;
Work as li=ljWhen, smooth item Vi(li,lj)=0;Otherwise Vi(li,lj)=1.
5. according to claim 1 merge the scene modeling method taken photo by plane with ground multi-view image, which is characterized in that step
" the ground multi-view image acquired by ground camera obtains ground multi-view image set ", method in S300 are as follows:
Ground robot is based on planning path, passes through the ground camera continuous acquisition ground multi-view video being arranged thereon;
To the ground multi-view video of indoor scene, frame method abstract image frame is taken out using the adaptive video based on bag of words,
Obtain the ground multi-view image set of indoor scene.
6. according to claim 5 merge the scene modeling method taken photo by plane with ground multi-view image, which is characterized in that "
Face robot is based on planning path, passes through the ground camera continuous acquisition ground multi-view video being arranged thereon " during, positioning
Method includes initial machine people positioning, localization for Mobile Robot;
The initial machine people positioning, method are as follows: the first frame for obtaining ground camera acquisition video obtains robot described
The initial position taken photo by plane in map, and using the position as the starting point of robot subsequent motion;
The localization for Mobile Robot, method are as follows: machine is carried out based on each moment running data of initial position and robot
People position coarse positioning, by matching current time acquired video frame images and the composograph, when acquisition robot is current
It is engraved in the position in the map of taking photo by plane, and with the location information of position revision coarse positioning.
7. according to claim 1 merge the scene modeling method taken photo by plane with ground multi-view image, which is characterized in that step
S400 " is based on the composograph, the multi-view image of taking photo by plane is merged with the ground multi-view image, obtain indoor field
Scape model ", method are as follows:
Obtain position of the corresponding ground camera of every piece image in the map of taking photo by plane in ground multi-view image set;
By ground multi-view image and composograph match point be connected into it is original take photo by plane with the terrain surface specifications locus of points, generate across view
Constraint;
It is optimized to map is taken photo by plane with ground multi-view image point cloud by bundle adjustment;
Using map and the dense reconstruction of ground multi-view image progress of taking photo by plane, the dense model of indoor scene is obtained.
8. a kind of merge the scene modeling system taken photo by plane with ground multi-view image, which is characterized in that the system includes map of taking photo by plane
Construct module, composograph obtains module, multi-view image set obtains module, indoor scene model obtains module;
The map structuring module of taking photo by plane is configured to obtain the multi-view image of taking photo by plane of indoor scene to be modeled, and constructs and take photo by plane
Map;
The composograph obtains module, is configured to the map of taking photo by plane, by synthesizing ground visual angle ginseng by map of taking photo by plane
The method for examining image obtains composograph;
The multi-view image set obtains module, is configured to the ground multi-view image acquired by ground camera, obtains ground view
Angle image collection;
The indoor scene model obtains module, is configured to the composograph, by it is described take photo by plane multi-view image with it is described
Ground multi-view image is merged, and indoor scene model is obtained.
9. a kind of storage device, wherein being stored with a plurality of program, which is characterized in that described program is suitable for being loaded and being held by processor
Row merges the scene modeling method taken photo by plane with ground multi-view image to realize that claim 1-7 is described in any item.
10. a kind of processing unit, including processor, storage device;Processor is adapted for carrying out each program;Storage device is suitable for
Store a plurality of program;It is characterized in that, described program is suitable for being loaded by processor and being executed to realize any one of claim 1-7
The scene modeling method that the fusion is taken photo by plane with ground multi-view image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910502762.4A CN110223380B (en) | 2019-06-11 | 2019-06-11 | Scene modeling method, system and device fusing aerial photography and ground visual angle images |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910502762.4A CN110223380B (en) | 2019-06-11 | 2019-06-11 | Scene modeling method, system and device fusing aerial photography and ground visual angle images |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110223380A true CN110223380A (en) | 2019-09-10 |
CN110223380B CN110223380B (en) | 2021-04-23 |
Family
ID=67816534
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910502762.4A Active CN110223380B (en) | 2019-06-11 | 2019-06-11 | Scene modeling method, system and device fusing aerial photography and ground visual angle images |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110223380B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111723681A (en) * | 2020-05-28 | 2020-09-29 | 北京三快在线科技有限公司 | Indoor road network generation method and device, storage medium and electronic equipment |
CN112539758A (en) * | 2020-12-16 | 2021-03-23 | 中铁大桥勘测设计院集团有限公司 | Ground route drawing method and system in aerial video |
CN112651881A (en) * | 2020-12-30 | 2021-04-13 | 北京百度网讯科技有限公司 | Image synthesis method, apparatus, device, storage medium, and program product |
CN112815923A (en) * | 2019-11-15 | 2021-05-18 | 华为技术有限公司 | Visual positioning method and device |
WO2023284715A1 (en) * | 2021-07-15 | 2023-01-19 | 华为技术有限公司 | Object reconstruction method and related device |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114332383B (en) * | 2022-03-17 | 2022-06-28 | 青岛市勘察测绘研究院 | Scene three-dimensional modeling method and device based on panoramic video |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110181589A1 (en) * | 2010-01-28 | 2011-07-28 | The Hong Kong University Of Science And Technology | Image-based procedural remodeling of buildings |
CN103198524A (en) * | 2013-04-27 | 2013-07-10 | 清华大学 | Three-dimensional reconstruction method for large-scale outdoor scene |
CN104021586A (en) * | 2014-05-05 | 2014-09-03 | 深圳市城市管理监督指挥中心 | Air-ground integrated city ecological civilization managing system and method based on Beidou positioning |
CN104205826A (en) * | 2012-04-03 | 2014-12-10 | 三星泰科威株式会社 | Apparatus and method for reconstructing high density three-dimensional image |
US20160005145A1 (en) * | 2013-11-27 | 2016-01-07 | Google Inc. | Aligning Ground Based Images and Aerial Imagery |
JP2016045825A (en) * | 2014-08-26 | 2016-04-04 | 三菱重工業株式会社 | Image display system |
CN105843223A (en) * | 2016-03-23 | 2016-08-10 | 东南大学 | Mobile robot three-dimensional mapping and obstacle avoidance method based on space bag of words model |
CN107223344A (en) * | 2017-01-24 | 2017-09-29 | 深圳大学 | The generation method and device of a kind of static video frequency abstract |
CN107356230A (en) * | 2017-07-12 | 2017-11-17 | 深圳市武测空间信息有限公司 | A kind of digital mapping method and system based on outdoor scene threedimensional model |
CN109472865A (en) * | 2018-09-27 | 2019-03-15 | 北京空间机电研究所 | It is a kind of based on iconic model draw freedom can measure panorama reproducting method |
CN109862275A (en) * | 2019-03-28 | 2019-06-07 | Oppo广东移动通信有限公司 | Electronic equipment and mobile platform |
-
2019
- 2019-06-11 CN CN201910502762.4A patent/CN110223380B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110181589A1 (en) * | 2010-01-28 | 2011-07-28 | The Hong Kong University Of Science And Technology | Image-based procedural remodeling of buildings |
CN104205826A (en) * | 2012-04-03 | 2014-12-10 | 三星泰科威株式会社 | Apparatus and method for reconstructing high density three-dimensional image |
CN103198524A (en) * | 2013-04-27 | 2013-07-10 | 清华大学 | Three-dimensional reconstruction method for large-scale outdoor scene |
US20160005145A1 (en) * | 2013-11-27 | 2016-01-07 | Google Inc. | Aligning Ground Based Images and Aerial Imagery |
CN104021586A (en) * | 2014-05-05 | 2014-09-03 | 深圳市城市管理监督指挥中心 | Air-ground integrated city ecological civilization managing system and method based on Beidou positioning |
JP2016045825A (en) * | 2014-08-26 | 2016-04-04 | 三菱重工業株式会社 | Image display system |
CN105843223A (en) * | 2016-03-23 | 2016-08-10 | 东南大学 | Mobile robot three-dimensional mapping and obstacle avoidance method based on space bag of words model |
CN107223344A (en) * | 2017-01-24 | 2017-09-29 | 深圳大学 | The generation method and device of a kind of static video frequency abstract |
CN107356230A (en) * | 2017-07-12 | 2017-11-17 | 深圳市武测空间信息有限公司 | A kind of digital mapping method and system based on outdoor scene threedimensional model |
CN109472865A (en) * | 2018-09-27 | 2019-03-15 | 北京空间机电研究所 | It is a kind of based on iconic model draw freedom can measure panorama reproducting method |
CN109862275A (en) * | 2019-03-28 | 2019-06-07 | Oppo广东移动通信有限公司 | Electronic equipment and mobile platform |
Non-Patent Citations (5)
Title |
---|
OZGE C. OZCANLI ET AL.: "Geo-localization using Volumetric Representations of Overhead Imagery", 《INTERNATIONAL JOURNAL OF COMPUTER VISION》 * |
XIANG GAO ET AL.: "Accurate and efficient ground-to-aerial model alignment", 《PATTERN RECOGNITION》 * |
XIANG GAO ET AL.: "Ground and aerial meta-data integration for localization and reconstruction: A review", 《PATTERN RECOGNITION LETTERS》 * |
XIANGGAO ET AL.: "Ancient Chinese architecture 3D preservation by merging ground and aerial point clouds", 《ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING》 * |
徐喜梅等: "利用地面景象信息辅助的无人机自主定位技术", 《计算机测量与控制》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112815923A (en) * | 2019-11-15 | 2021-05-18 | 华为技术有限公司 | Visual positioning method and device |
US20220375220A1 (en) * | 2019-11-15 | 2022-11-24 | Huawei Technologies Co., Ltd. | Visual localization method and apparatus |
CN111723681A (en) * | 2020-05-28 | 2020-09-29 | 北京三快在线科技有限公司 | Indoor road network generation method and device, storage medium and electronic equipment |
CN111723681B (en) * | 2020-05-28 | 2024-03-08 | 北京三快在线科技有限公司 | Indoor road network generation method and device, storage medium and electronic equipment |
CN112539758A (en) * | 2020-12-16 | 2021-03-23 | 中铁大桥勘测设计院集团有限公司 | Ground route drawing method and system in aerial video |
CN112539758B (en) * | 2020-12-16 | 2023-06-02 | 中铁大桥勘测设计院集团有限公司 | Ground line drawing method and system in aerial video |
CN112651881A (en) * | 2020-12-30 | 2021-04-13 | 北京百度网讯科技有限公司 | Image synthesis method, apparatus, device, storage medium, and program product |
CN112651881B (en) * | 2020-12-30 | 2023-08-01 | 北京百度网讯科技有限公司 | Image synthesizing method, apparatus, device, storage medium, and program product |
WO2023284715A1 (en) * | 2021-07-15 | 2023-01-19 | 华为技术有限公司 | Object reconstruction method and related device |
Also Published As
Publication number | Publication date |
---|---|
CN110223380B (en) | 2021-04-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110223380A (en) | Fusion is taken photo by plane and the scene modeling method of ground multi-view image, system, device | |
CN108564616B (en) | Fast robust RGB-D indoor three-dimensional scene reconstruction method | |
CN113985445B (en) | 3D target detection algorithm based on camera and laser radar data fusion | |
Ding et al. | Automatic registration of aerial imagery with untextured 3d lidar models | |
Ding et al. | Vehicle pose and shape estimation through multiple monocular vision | |
Ma | Building model reconstruction from LiDAR data and aerial photographs | |
Gao et al. | Ground and aerial meta-data integration for localization and reconstruction: A review | |
CN117315146B (en) | Reconstruction method and storage method of three-dimensional model based on trans-scale multi-source data | |
Özdemir et al. | A multi-purpose benchmark for photogrammetric urban 3D reconstruction in a controlled environment | |
CN113345072A (en) | Multi-view remote sensing topographic image point cloud reconstruction method and system | |
Byrne et al. | Maximizing feature detection in aerial unmanned aerial vehicle datasets | |
Wang et al. | Building3d: A urban-scale dataset and benchmarks for learning roof structures from point clouds | |
Sun et al. | Geographic, geometrical and semantic reconstruction of urban scene from high resolution oblique aerial images. | |
Rothermel et al. | Fast and robust generation of semantic urban terrain models from UAV video streams | |
Hu et al. | A combined clustering and image mapping based point cloud segmentation for 3D object detection | |
Bai et al. | Colmap-PCD: An Open-source Tool for Fine Image-to-point cloud Registration | |
Hwang et al. | 3D modeling and accuracy assessment-a case study of photosynth | |
Ekholm | 3-D scene reconstruction from aerial imagery | |
Oesau | Geometric modeling of indoor scenes from acquired point data | |
Naimaee et al. | Automatic extraction of control points from 3D Lidar mobile mapping and UAV imagery for aerial triangulation | |
Wendel et al. | Visual Localization for Micro Aerial Vehicles in Urban Outdoor Environments | |
Mostofi | 3D Indoor Mobile Mapping using Multi-Sensor Autonomous Robot | |
Mohammed | Fusion of Terrestrial and Airborne Laser Data for 3D Modeling Applications | |
Zhang | Targetless Camera-LiDAR Calibration for Autonomous Systems | |
Pereira | High precision monocular visual odometry |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |