CN109255833A - Based on semantic priori and the wide baseline densification method for reconstructing three-dimensional scene of gradual optimization - Google Patents

Based on semantic priori and the wide baseline densification method for reconstructing three-dimensional scene of gradual optimization Download PDF

Info

Publication number
CN109255833A
CN109255833A CN201811157420.5A CN201811157420A CN109255833A CN 109255833 A CN109255833 A CN 109255833A CN 201811157420 A CN201811157420 A CN 201811157420A CN 109255833 A CN109255833 A CN 109255833A
Authority
CN
China
Prior art keywords
super
pixel
depth
image
dimensional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201811157420.5A
Other languages
Chinese (zh)
Inventor
姚拓中
安鹏
何加铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningbo University of Technology
Original Assignee
Ningbo University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningbo University of Technology filed Critical Ningbo University of Technology
Priority to CN201811157420.5A priority Critical patent/CN109255833A/en
Publication of CN109255833A publication Critical patent/CN109255833A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/187Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The present invention provides a kind of based on semantic priori and the wide baseline densification method for reconstructing three-dimensional scene of gradual optimization, the method includes the steps: the image of several different perspectivess is provided, all images are subjected to super-pixel segmentation, are divided into the super-pixel set with local homogeney and non-regular shape;The positional relationship in image between the initial depth and different perspectives image of each super-pixel is obtained, using high-level semantic priori as constraint condition, region merging technique will be carried out in conplane neighbouring super pixels region;Combined all areas are subjected to estimation of Depth using Markov random field model, obtain original depth-map;The estimation of Depth of mistake is abandoned by way of depth integration and removes the depth information of redundancy in original depth-map, obtains final three-dimensional scenic.The above method can obtain more more stable than conventional method and accurate three-dimensional reconstruction effect under different wide baseline environment.

Description

Based on semantic priori and the wide baseline densification method for reconstructing three-dimensional scene of gradual optimization
Technical field
The present invention relates to field of image processings, more particularly to fine and close three-dimensional based on semantic priori and the wide baseline of gradual optimization Scene reconstruction method.
Background technique
As a big research hotspot of computer vision field, 3 D scene rebuilding technology has been widely studied and has been applied to Aerospace, in the numerous areas such as unmanned and digital entertainment.Traditional 3 D scene rebuilding technology is at multiple and different visual angles On the basis of the image sequence of shooting, using based drive structure restoring method (Structure From Motion, SFM) to pass Scene is simultaneously subject to three-dimensional presentation in the form of sparse cloud or fine and close model by the posture for the form estimation camera returned.Realize the skill The critical issue of art first is that how accurately to find the corresponding relationship between different perspectives image.The position of camera when due to shooting Usually there is randomness with posture, usually there is biggish motion change between camera, (i.e. there are longer between camera photocentre Baseline), it causes to exist between different perspectives and significantly block and geometric deformation, to greatly increase the difficulty of images match, this is just It is classical wide baseline matching problem.The problem is frequently present in robot visual guidance, map structuring of taking photo by plane, augmented reality etc. In many application fields, there is important research significance.
Wide baseline image matching problem most earlier than 1998 by the Pritchett of robot research team of Oxford University and Zisserman proposes that hereafter many researchs focus on the more robust feature of design for estimating essential matrix.Tuytelaars Et al. and Xiao et al. used affine invariants, and other many work have then used SIFT description and have emphasized speed Daisy description son or based on Scale invariant description son.In addition to this, Bay et al. and Micusik et al. are used respectively Line segment and the rectangle being made of line segment are as feature, and the provincial characteristics such as MSER or texture descriptor are also in wide baseline ring It is used under border, description design also more considers the case where reply is blocked.In fine and close scene rebuilding, point and region Feature is very widely used, such as SIFT-flow, Patch-match, and spatial pyramid matching and the use of deformation model are all Facilitate the scene rebuilding of wide baseline environment.Generally, the matching based on region is the main trend under current wide base line condition One of, have many characteristics, such as than Points And lines can more robust and the characteristics of accurately reflect degree similar to each other or otherness.
It is worth noting that, geometry estimation method during SFM based on trigonometric ratio requires the phase between adjacent view Machine movement is smaller, and this is usually unable to satisfy under wide base line condition.Currently, having many research achievements passes through artificial intelligence skill Art realizes the estimation of Depth of scene, three-dimensional structure reasoning and semantic tagger etc. on single image.Some researchs start with The semantic information that single image reasoning obtains is dedicated to improving traditional estimation of Depth based on multi-angle of view geometry, and SLAM vision is led The sparse 3 D point cloud of boat system estimates and the precision of fine and close reconstructing three-dimensional model.But, so far based on the overwhelming majority on State work, either sparse three-dimensional is rebuild or fine and close three-dimensional reconstruction, nearly all or the environmental applications based on narrow baseline.Tradition Three-dimensional rebuilding method based on geometry and semantic fusion, start to become one of development trend, and this also more meet the mankind for The cognitive style of scene, and this will also play a role in the three-dimensional reconstruction based on wide polar curve.Therefore it provides a kind of different Wide baseline environment and the three-dimensional rebuilding method for having pinpoint accuracy, become current urgent problem to be solved.
Summary of the invention
The technical scheme is that a kind of based on semantic priori and the wide baseline densification 3 D scene rebuilding of gradual optimization Method, the method includes the steps:
S1, the image of several different perspectivess is provided, all images is subjected to super-pixel segmentation, being divided into has local homogeneity The super-pixel set of property and non-regular shape;
Positional relationship in S2, acquisition image between the initial depth and different perspectives image of each super-pixel, will be high-rise Semantic priori will carry out region merging technique in conplane neighbouring super pixels region as constraint condition;
S3, combined all areas are subjected to estimation of Depth using Markov random field model, obtain original depth-map;
S4, the estimation of Depth that mistake is abandoned by way of depth integration and by the depth information of redundancy in original depth-map It is removed, obtains final three-dimensional scenic.
Preferably, in step s 2, when carrying out region merging technique, using Markov random field model to different perspectives figure The position and orientation of each super-pixel plane make inferences as in, whether to determine two super-pixel in two width image to be matched Belong to the same area in three-dimensional space;
Wherein, by n-ththI-th in width imagethPlane parameter corresponding to a super-pixelIt will correspond in MRF model A node, and on the basis of the external appearance characteristic of super-pixel combine super-pixel between synteny, connectivity and coplanarity, figure The multiplex images feature-modeling energy functions such as corresponding relationship and depth as between:
Wherein, Ψ is that there are the images of corresponding relationship to set,For corresponding to reference picture n's Camera posture, and dTFor the depth value obtained by trigonometric ratio.Plane parameter corresponding to super-pixel i in image n isθ For the parameter of the description different images row characteristic statistics attribute learnt by Ground Truth depth data, YnTo pass through Confidence level corresponding to the super-pixel plane parameter that single image feature assessment obtains,It refers to passing through trigonometric ratio in image n Approximate depth is obtained,Representative is corresponding to the super-pixel plane parameter that is obtained in image n by single image feature assessment Confidence level.
Preferably, the calculation formula of the first item of energy function are as follows:
Wherein,si=1,2 ..., SiTo indicate by SiPicture point s in the super-pixel i of a pixel compositioni's Feature,To pass through three-dimensional point s on super-pixel i from camera photocentre transmittingiRay, θrTo pass through Ground Truth depth The parameter for the description different images row characteristic statistics attribute that data learn.
Preferably, the calculation formula of the Section 2 of energy function are as follows:
Wherein,Indicate two pixel siAnd sjBetween relative distance.
Preferably, the calculation formula of the Section 3 of energy function are as follows:
In above formula, it is assumed that there are J in image m and nmnA matching double points,
And
Preferably, the calculation formula of the Section 4 of energy function are as follows:
In image n, there is KnA point can obtain approximate depth d by trigonometric ratioT
Preferably, in step s 2, the initial depth of each super-pixel in image is obtained using Markov random field model After the positional relationship that offsets between degree and different perspectives image, using high-level semantic priori come respectively as new between super-pixel Constraint recycles non-formaldehyde finishing method to realize region merging technique.
Preferably, realizing region merging technique based on weighting function W (i, j) and using non-formaldehyde finishing method, in which:
In above formula, θijFor the respective normal between two neighbouring super pixels i and jWith Angle, OBJijIt then represents and passes through the semantic classes between the obtained adjacent i and j of linguistic indexing of pictures algorithm based on decision tree Difference artificially assigns OBJ if the two belongs to identical semantic categoryijLower weight, it is on the contrary then assign biggish weight, a1 =a2=0.5.
Above-mentioned technical proposal has the following advantages that or the utility model has the advantages that the method in the application uses MRF model to different views The three-dimensional position of super-pixel and towards carrying out reasoning simultaneously in the image of angle, and combine high-level semantic priori to the process of three-dimensional reconstruction Guidance is provided.At the same time, a kind of recursion frame is also used to realize the gradual optimization of scene depth.Context of methods exists More more stable than conventional method and accurate three-dimensional reconstruction effect can be obtained under different wide baseline environment.
Detailed description of the invention
With reference to appended attached drawing, more fully to describe the embodiment of the present invention.However, appended attached drawing be merely to illustrate and It illustrates, and is not meant to limit the scope of the invention.
Fig. 1 is that the present invention is based on semantic priori and the gradual ensemble stream for optimizing wide baseline densification method for reconstructing three-dimensional scene Journey schematic diagram;
Fig. 2 is that the image based on super-pixel indicates;
Two-dimensional geometry relationship of the Fig. 3 between plane parameter α and ray R;
Fig. 4 is that the present invention is based on semantic priori and the gradual flow chart element for optimizing wide baseline densification method for reconstructing three-dimensional scene Figure;
Fig. 5 is that the present invention is based on super-pixel in semantic priori and the wide baseline densification method for reconstructing three-dimensional scene of gradual optimization Between synteny constraint;
Fig. 6 is that the present invention is based on regions in semantic priori and the wide baseline densification method for reconstructing three-dimensional scene of gradual optimization to close And the depth map comparison of front and back;
Fig. 7 is that the present invention is based on different views in semantic priori and the wide baseline densification method for reconstructing three-dimensional scene of gradual optimization Angle corresponds to the visual correlation between depth value;
Fig. 8 is to be based on three-dimensional scene models after " Stanford I " data set is tested using method of the invention to compare Figure;
Fig. 9 is that three-dimensional scene models pair after " Stanford II " data set is tested are based on using method of the invention Than figure;
Figure 10 is that three-dimensional scene models after " Stanford III " data set is tested are based on using method of the invention Comparison diagram;
Figure 11 is that three-dimensional scene models pair after " Stanford IV " data set is tested are based on using method of the invention Than figure;
Figure 12 is the three-dimensional scene models created using the method for the present invention based on different data collection.
Specific embodiment
In the following with reference to the drawings and specific embodiments to the present invention is based on semantic priori and the gradual wide baseline densification three of optimization Dimension scene reconstruction method is described in detail.
As shown in Figure 1, it is a kind of based on semantic priori and the wide baseline densification method for reconstructing three-dimensional scene of gradual optimization, including Step:
S1, the image of several different perspectivess is provided, all images is subjected to super-pixel segmentation, being divided into has local homogeneity The super-pixel set of property and non-regular shape;
Positional relationship in S2, acquisition image between the initial depth and different perspectives image of each super-pixel, will be high-rise Semantic priori will carry out region merging technique in conplane neighbouring super pixels region as constraint condition;
S3, by combined all areas using Markov random field (Markov Random Field, MRF) model into Row estimation of Depth obtains original depth-map;
S4, the estimation of Depth that mistake is abandoned by way of depth integration and by the depth information of redundancy in original depth-map It is removed, obtains final three-dimensional scenic.
Specifically, the present invention shoots structuring scene using calibration for cameras, and by based on figure without prison It superintends and directs partitioning algorithm and input picture is divided into the super-pixel set with local homogeney and non-regular shape in advance, such as Fig. 2 institute Show,.Then, " projection of the two-dimentional super-pixel from 3-D image piecemeal " this hypothesis of use is to set up for structuring scene , i.e., 3-D image piecemeal must be positioned at the super-pixel boundary for passing through its two-dimensional projection projection cone and its locating for three-dimensional Overlapping region between plane.We will project to three-dimensional position corresponding to the 3-D image piecemeal of super-pixel and towards into Row parametrization, uses plane parameterIt is indicated, as shown in Figure 3.
Arbitrary point on this planeIt is all satisfied following constraint: αTP=1.Wherein, 1/ | α | it is arrived for camera photocentre The distance of the plane, and α=α/| α | for the plane normal vector (first α refers to the plane parameter of super-pixel, and 1/ | α | Physical meaning refer to camera photocentre to the plane distance, the two be multiplied to obtain be exactly the plane normal vector).If Ri For normalized vector corresponding to the ray from camera photocentre by plane midpoint i, thenFor camera light Distance of the heart to point i.
In order to which whether two super-pixel determined in two width image to be matched belong to the same area in three-dimensional space, herein It is carried out using classical Markov random field model come the position and orientation to super-pixel plane each in different perspectives image Reasoning.N-ththI-th in width imagethPlane parameter corresponding to a super-pixelThe section that will correspond in MRF model Point, and combine on the basis of the external appearance characteristic of super-pixel synteny, connectivity and coplanarity between super-pixel, pair between image It should be related to and the multiplex images feature-modeling energy functions such as depth:
In formula (1), Ψ is that there are the images of corresponding relationship to set,For corresponding to reference to figure As the camera posture of n, and dTFor the depth value obtained by trigonometric ratio.Plane parameter corresponding to super-pixel i in image n isθ is the parameter of the description different images row characteristic statistics attribute learnt by Ground Truth depth data, Yn For confidence level corresponding to the super-pixel plane parameter that is obtained by single image feature assessment,It refers to passing through in image n Trigonometric ratio obtains approximate depth,Representative is the super-pixel plane parameter obtained in image n by single image feature assessment Corresponding confidence level.
The wide baseline densification 3 D scene rebuilding algorithm of this paper uses recursion frame as shown in Figure 4: passing through MRF model The depth that reasoning obtains will be used to realize together the merging in similar super-pixel region in segmentation figure with high-level semantic priori.Merge Segmentation figure afterwards will be used again to MRF model to carry out estimation of Depth, and final three-dimensional scene models pass through more views in later period Angle depth integration optimizes to obtain.
The first item of energy function is by plane parameter αnIt is modeled as single image feature XnFunction, and using in SFM algorithm Common relative error punishes it as evaluation criterion, that is, calculates the depth that estimation obtainsIt is deep with Ground Truth Difference between angle value d:We usesi=1,2 ..., SiTo indicate by SiA pixel composition Picture point s in super-pixel iiFeature, then following equation can be obtained:Wherein,For from camera photocentre Transmitting is by three-dimensional point s on super-pixel iiRay.If estimating that obtained depth isSo relative error is fixed Shown in justice such as formula (2):
In when upper, θrFor the description different images row characteristic statistics category learnt by GroundTruth depth data The parameter of property.For featureR526What is represented is the feature vector that 526 dimensions are made of real number, is calculated separately herein Multiplex images feature states it.Herein, we carry out convolution using Law template and the image block having a size of 3 × 3 To extract 9 texture energy values and 6 texture gradient values, and obtain the two of them Color Channel in YCbCr color space Value.Feature filters can be described below:N=1,2 ..., 17.Wherein, k= 2,4 give 34 features for each super-pixel, and I indicates image, and F represents feature filters.14 shapes of super-pixel and Position feature then passes through Law template extraction.In order to obtain more contextual informations, counted under three kinds of different spaces scales herein Four have feature corresponding to maximum sized neighbour's super-pixel.At the same time, it is more to strengthen to be also added into light measurement feature The feature description of super-pixel under view stereo visual environment.Finally, super-pixel can be tieed up by 34* (4+1) * 3+14+1=525 Feature vectorIt indicates.
Steps are as follows for photography consistency metric calculating based on light measurement:
(a) normalization based on light measurement is carried out to the super-pixel that different perspectives projects: calculates k-th of visual angle projection The corresponding chroma vector of super-pixel:WhereinWithRespectively represent all pictures in projection The color mean value of element;
(b) cost of estimation super-pixel projection: right by the gloomy window of pa (Parzen Window) based on linear kernel function The pixel after each channel normalization in RGB color calculates separately the statistic histogram with 20bin, and will be above-mentioned Histogram vector hkIt is indicated.Then, the difference between histogram is calculated using Chi-squared distance:
(c) projection that searching meets entire super-pixel is located in image and meetsAll visual angle k set K, For representing 3-D image piecemeal unobstructed in k-th of camera.Finally, the cost function that we are defined as follows.In formula (4) In, Ck(i) Section 2 represents the coloration difference projected in super-pixel i and k-th of visual angle, crefRepresent reference viewing angle statistics Obtained chroma vector.
In order to minimize the accumulation relative error of all picture points in super-pixel, we are to characteristics of image and super-pixel plane Relationship between parameter is modeled, as shown in formula (5).Wherein, θr∈R526, r=1,2 ..., 11 be the parameter for needing to estimate Feature vector describes in two dimensional image in different far and near distances (the i.e. system of feature corresponding to the scene of different images row r) Count attribute.For in point siUpper depth predictionConfidence level, when obtained topography is special Sign is not enough to predict siDepth when,
The relationship between plane parameter that the Section 2 of energy function passes through two super-pixel i and j of analysis, thus to each other Synteny, connection structure and above-mentioned three attribute of coplanarity modeled respectively, can be by formula (6) to above-mentioned geometrical relationship Carry out unified presentation:
N represents the set of super-pixel pair in above formula.
The above method carries out about the synteny between super-pixel by carrying out the selection of picture point along long straightway Beam, this similarly helps to obtain each other the relationship between the not region of direct neighbor.We select two to be located at certain Super-pixel i and j on straightway different location, then by that whether there is or not several curve projectables is straight to this in two dimensional image plane On line.However, the straightway in image is equally possible for straightway in three dimensions.For this purpose, one can be selected in the picture A picture point p on the straightway is α when p is located at parameteriPlane on when its three-dimensional position be sj, and when it is located at ginseng Number is αjPlane on when its three-dimensional position be then s'j.The energy term punished alongPixel s on directionjAnd s'jIt Between relative distance, define as shown in formula (7):
In formula (7),So, sjWith s'jBetween relative distanceIt is equivalent toWhereinIt sets Reliability vijThen it is defined by the length of straightway and curvature information.
We select two pixel s on the boundary of super-pixel i and jiAnd sjAnd to the relative distance between them into Row punishment is to realize that connectivity constrains, so that it is guaranteed that being fully connected between them.siAnd sjBetween relative distance definition such as Under, form is similar with formula (8).Wherein, when the two is not connected, two-valued variable yij=0, then y when being connected to each otherij =1.
Connectivity between super-pixel defines similar, selects third to pixel respectively at the center of each super-pixel herein Point s "iAnd s "jTo constrain mutual coplanar structure.Along rays”jPlane where to super-pixel i it is opposite away from From being defined as follows:
In formula (9),When two super-pixel i and j are generally aligned in the same plane, confidence levelWhen being not connected with each other to two and plane apart from each other applies above-mentioned constraint, above-mentioned side can be used Formula is selected three such pixels and is punished using the energy term above-mentioned relative distance.
Three-dimensional point in scene would generally occur in multiple and different multi-view images, if two in two images picture Vegetarian refreshments pn=(xn,yn,zn) andIt matches, then this two o'clock is likely to three-dimensional coordinate having the same.For This, equally calculates the relative distance between them herein:
p'n-pn=Qmn[pm;1]-pn=Qmn[Rm/(RmTαm);1]-Rn/(RnTαn) (10)
According to formula (10), we are defined available following energy term:
In formula (11), it is assumed that there are J in image m and nmnA matching double points, and The energy term carries out the three-dimensional distance between two image corresponding points under the same coordinate system Punishment.Therefore, some inaccuracy of camera Attitude estimation Q still will lead to the distance between corresponding points and become smaller, thus in above-mentioned feelings Above-mentioned image corresponding points can still be used under condition.In this way, there is no need to pass through the bundle adjustment during trigonometric ratio The three-dimensional position of (Bundle Adjustment) algorithm acquisition pixel.
For further, in step s 2, the acquisition formula of initial depth value are as follows:
In image n, there is partial dot that can obtain approximate depth d by trigonometric ratioT.Due to these point depth value not It is enough accurate, then we are to depth dTWithBetween relative error punished.Assuming that there is KnA point can pass through three Angling obtains depth value, then the available energy term as shown in formula (12).Wherein, the point in plane joined herein soft Constraint is so that its depth value is equal to the depth value that trigonometric ratio obtains.
We are during the depth calculation based on trigonometric ratio using the depth that single image reasoning obtains to remove scene Scale ambiguity, then using light-stream adjustment to obtain pixel association optimize.For this purpose, herein using such as lower section Firstly, calculating 128 dimension SURF features, and the association between pixel is calculated using Euclidean distance likes:.Then, flat using light beam Poor method calculates the posture of cameraAnd the depth d of matched pixel pointT, obtain after above-mentioned trigonometric ratio Depth value will be used for energy term D4In ().Then, the depth of picture point is calculated using single image feature To be efficiently modified the matched precision of picture point in the scene that there is repeated texture structure.
In order to which the plane parameter α to super-pixel estimates, it would be desirable to conditional likelihood probability D (α | X, Y, dT;θ) into Row maximizes.Wherein, all energy term D1, D2And D4Corresponding to L1Normalize item.For this purpose, this paper Markov random field mould The MAP reasoning of type can be solved by classical linear programming algorithm.When solving, there is employed herein following approximation sides Formula: due to energy term D3() is not convex function, then being calculated first by linear programmingEstimation, then useReplace energy term D3In ()Then it recalculates linear programming and leads to Crossing after carrying out recurrence several times with upper type terminates.
In addition, in step s 2, by MRF reasoning, we can obtain the initial depth of each super-pixel in image Relative positional relationship between different perspectives image.Initial depth is not although exactly accurate, particularly with longer-distance For region, but it but facilitates to form the first relatively reliable constraint towards relationship between neighbouring super pixels C1, it may be assumed that theoretically both above-mentioned to be likely in image if two adjacent super-pixel plane parameter α having the same In be located at approximately the same plane on.Using the constraint, we can will obtain numerous initially by unsupervised image over-segmentation Super-pixel is reasonably merged, and then reduces the saltus of discontinuity in region in image.Herein, two neighbouring super pixels i and j Between relationship can pass through each normalWithBetween angle thetaijIt is measured.
At the same time, we have also been respectively adopted high-level semantic priori and have come respectively as the new constraint between super-pixel, i.e., C2.For C2, the neighbouring super pixels for belonging to same semantic category should be subordinated to same plane with higher probability.So, I The weighting function W (i, j) that can be defined as follows:
In formula (13),And OBJijIt then represents and passes through the linguistic indexing of pictures algorithm institute based on decision tree Semantic classes difference between obtained adjacent i and j artificially assigns OBJ if the two belongs to identical semantic categoryijIt is lower Weight, it is on the contrary then assign biggish weight.Normalized weight coefficient a is set herein1=a2=0.5.Utilize the weight in formula (13) Function W (i, j), we still realize region merging technique using classical non-formaldehyde finishing method.At this point, image can pass through a nothing It is indicated to figure G (V, E).Wherein, the super-pixel set in V representative image, and E represents any two super-pixel ViAnd Vj Between connection line set, and EijThen W (i, j) can be used to be indicated.
Fig. 6 gives a typical example based on " Merton College III " image set.Wherein, the of Fig. 6 (a) Two width and third width image give the segmentation figure comparison before and after region merging technique, and Fig. 6 (b) gives the image before and after region merging technique The initial and ultimate depth figure of scene is then set forth in semantic annotation result comparison, the first row and secondary series image of Fig. 6 (c) Estimation comparison.It is not difficult to find out, by combining high-level semantic priori and depth information, is in conplane neighbouring super pixels area Domain is preferably merged together, and is applied in recursive estimation of Depth next time, and it is worth noting that its The improvement of linguistic indexing of pictures is also evident from.From initial depth map it can also be seen that due to super-pixel at the beginning There are the planes of very little size for division, thus are easier to cause the discontinuity of estimation of Depth, the right side of three-dimensional building model in figure There have been the discontinuities of above-mentioned highly significant for side.But, merged by super-pixel, be can see from final depth map The estimation of Depth discontinuity phenomenon stated has obtained significant change.
For further, in step s 4, usually there is a degree of error in the original depth-map of generation, be easy to cause Some three-dimensional point corresponds to different depth values in different perspectives image.For this reason, it may be necessary to be abandoned by way of depth integration The estimation of Depth of mistake simultaneously removes the depth information of redundancy, to obtain more accurate and compact estimation of Depth.This Selected works select that a visual angle of center is occupied in all multi-view images is reference viewing angle (if only any one there are two choosing if visual angle The visual angle of side is reference viewing angle), and the depth map at remaining visual angle is projected in the reference depth figure respectively to be used to analyze not Positional relationship between same depth value and three-dimensional point.In depth integration, the convergence strategy based on stability is used herein.Its In, the stability metric of each depth value may be defined as blocking the depth map number of three-dimensional point in reference viewing angle and violate freely empty Between difference between the depth map number that constrains.Fig. 7 gives the corresponding three-dimensional point of reference viewing angle corresponding with remaining visual angle three Three kinds of existing different types of visual correlations between dimension point: (a) the three-dimensional point A' that i is observed when visual angle is seen in reference viewing angle Before the three-dimensional point A observed, that is, violate the free space constraint of A';It (b) is the same point B=observed by two visual angles B';(c) the three-dimensional point C that reference viewing angle is observed is blocked by the three-dimensional point C' that visual angle i is observed.
During depth integration, we judge the stability size of each depth value and respectively in reference camera images Pixel with its corresponding to dimensionally target distance is predicted, finally merge obtained stable depth value and need to meet stability For constraint non-negative and that distance reference camera depth value is nearest.For obtained stable depth figure, it will be carried out to be based on bilateral The post-processing operations such as the depth smooth of filtering and cavity filling, and then realize more accurate scene rebuilding.
The experimental result of the above method is analyzed below:
The application do not use only in Stanford University campus the wide baseline image of multiple groups (" Stanford I, II, III and IV ") it is used as lab diagram image set, and joined " Merton College ", and " University Library ", " Wadham College " etc. meets the multi-view image data set of wide base line condition.
Due to hardly resulting in the Ground Truth threedimensional model of scene, we only will herein by qualitative mode Algorithm from be not bound with the classical multi-view angle three-dimensional algorithm for reconstructing of high-rise image, semantic in the different wide baseline chart of following eight It is compared in image set to test respective performance.At the same time, it is evaluated using the SIFT matching optimized based on RANSAC The athletic posture variation degree of camera between different perspectives image, and the region growing methods by being expanded based on seed removal with The unrelated sky areas of three-dimensional scene models.
(1) " Stanford I " data set
First group data set " Stanford I " is only made of 2 width images, it can be seen that the main movement of camera be around The rotary motion by a small margin of optical center matches available 38 pairs of feature corresponding points by SIFT each other, it is seen that stringent meaning In justice and it is unsatisfactory for the condition of wide baseline.Fig. 8 (a) and (b) be set forth by method and context of methods obtain based on not With the model of place of angular observation.It can be clearly seen that from the top right plot of Fig. 8 (a), the different directions in two, distant place building are put down Face is the different region of two pieces of depth by reasoning.In Fig. 8 (b), the context of methods in conjunction with high-rise image, semantic is obtained big The change in depth in block domain then shows continuity, relatively accurately describes actual scene.
(2) " Stanford II " data set
Second group data set " Stanford II " is made of 3 width images, it can be seen that camera is generally equally enclosed Around the rotary motion of optical center, but its rotational steps is obviously greater than " Stanford I " data set, and presence is more significant Translational motion.Therefore, 8 pairs and 0 pair of feature corresponding points can be respectively obtained by matching us only by SIFT.Fig. 9 (a) and (b) The model of place based on different angle observation obtained by method and context of methods is set forth.It is not ugly by comparing It arrives, context of methods preferably has estimated the posture relationship between different perspectives image, thus has obtained more meeting actual scene and retouch The scene rebuilding stated is as a result, i.e. building three dimensional modeled segment representated by different perspectives is substantially at same water from top view On horizontal line, this can see from the comparison of the top right plot of Figure 10 (a) and (b).In addition, the model of place that context of methods obtains eliminates The region that original many depth are stated in a jumble, but can also have the phenomenon that depth excess smoothness, for example build in the circle door in center The depth in portion region is smoothed to be similar to the wall of two sides, and this does not obviously meet actual conditions.
(3) " Stanford III " data set
Third group data set " Stanford III " is made of 2 breadth baseline images, it can be seen that camera carries out simultaneously Largely rotation and translation motion.Feature corresponding points are equally unable to get by SIFT matching.Figure 10 (a) and (b) are respectively Give the model of place based on different angle observation obtained by method and context of methods.It is not difficult to find out by comparison, this Literary method has obtained more accurate camera Attitude estimation as a result, and this has benefited from the region merging technique institute based on upper language priori Bring depth optimization.Moreover, the three-dimensional scene models that context of methods obtains more precisely describe building it is different towards face Between geometrical relationship.
(4) " Stanford IV " data set
4th group data set " Stanford IV " is made of 4 breadth baseline images, it is not difficult to find out that camera has equally carried out one The large-scale rotation and translation motion of series.Us are matched by SIFT to be only capable of obtaining 13 pairs of features correspondences from preceding two images Point, and any feature corresponding points are then equally unable to get in other image pairs.The side of passing through is set forth in Figure 11 (a) and (b) The model of place based on different angle observation that method and context of methods obtain.In Figure 11 (a), left side nearby arch gate It builds estimation of Depth at the greenbelt and ground in front and non-continuous event occurs, and the building of rear side distant place and the depth of trees Then there are a large amount of mistakes in estimation, and this point is clear that from the second width image of the second row.In contrast, context of methods Then achieve significant improvement effect.In Figure 11 (b), the three-dimensional scene models that context of methods obtains then eliminate method and are deposited The problem of, not only relatively accurately have estimated the camera motion attitudes vibration between multiple and different multi-view images, but also obtain Global depth figure be well reflected true scene.
(5) other wide baseline chart image sets
In addition, we also pass through " Merton College III ", " University Library " and " Wadham The wide base-line data collection of College " three above classics tests the performance of context of methods.In figure 12 it can be seen that being directed to The scene of outdoor structure is under different wide baseline environment, and context of methods still retouch by the available real scene that is closer to The threedimensional model stated.
Above-mentioned based in semantic priori and the wide baseline densification method for reconstructing three-dimensional scene of gradual optimization, have following several A feature: (1) using super-pixel as geometric graphic element image expression is carried out.This have the advantage that: firstly, having compared to pixel Greater area of super-pixel helps to reduce the associated ambiguity in region in weak texture environment;Secondly, can preferably reflect field The discontinuity of the real border and depth of object in scape;Third, when energy is minimized and is solved, the figure section based on super-pixel Point number is few more many than node of graph number pixel-based, and computation complexity is lower;(2) it is utilized on the basis of single image Abundant low-level feature information, and high-level semantic priori is had also combined to improve the effect of scene rebuilding;(3) pass through recursive shape The optimization of formula realization scene depth.The scene depth estimated by model combines semantic priori to instruct unsupervised image point It cuts, and updated segmentation figure is used for estimation of Depth next time.
In this application, it illustrates how that multiplex images feature is special with the geometry based on trigonometric ratio under wide base line condition Sign is combined to construct accurate three-dimensional scene models.Method in the application is using MRF model in different perspectives image The three-dimensional position of super-pixel and towards carrying out reasoning simultaneously, and high-level semantic priori is combined to refer to the offer of the process of three-dimensional reconstruction It leads.At the same time, a kind of recursion frame is also used to realize the gradual optimization of scene depth.It is demonstrated experimentally that the side this paper Method can obtain more more stable than conventional method and accurate three-dimensional reconstruction effect under different wide baseline environment.
For a person skilled in the art, after reading above description, various changes and modifications undoubtedly be will be evident. Therefore, appended claims should regard the whole variations and modifications for covering true intention and range of the invention as.It is weighing The range and content of any and all equivalences, are all considered as still belonging to the intent and scope of the invention within the scope of sharp claim.

Claims (8)

1. one kind is based on semantic priori and the wide baseline densification method for reconstructing three-dimensional scene of gradual optimization, which is characterized in that described Method comprising steps of
S1, provide the image of several different perspectivess, all images be subjected to super-pixel segmentation, be divided into have local homogeney and The super-pixel set of non-regular shape;
Positional relationship in S2, acquisition image between the initial depth and different perspectives image of each super-pixel, by high-level semantic Priori will carry out region merging technique in conplane neighbouring super pixels region as constraint condition;
S3, combined all areas are subjected to estimation of Depth using Markov random field model, obtain original depth-map;
S4, the estimation of Depth that mistake is abandoned by way of depth integration simultaneously carry out the depth information of redundancy in original depth-map It removes, obtains final three-dimensional scenic.
2. according to claim 1 be based on semantic priori and the wide baseline densification method for reconstructing three-dimensional scene of gradual optimization, It is characterized in that, in step s 2, when carrying out region merging technique, using Markov random field model in different perspectives image The position and orientation of each super-pixel plane make inferences, to determine whether two super-pixel in two width image to be matched belong to The same area in three-dimensional space;
Wherein, by n-ththI-th in width imagethPlane parameter α corresponding to a super-pixeli nOne in MRF model will be corresponded to A node, and synteny, connectivity and coplanarity between super-pixel are combined on the basis of the external appearance characteristic of super-pixel, between image Corresponding relationship and the multiplex images feature-modeling energy function such as depth:
Wherein, Ψ is that there are the images of corresponding relationship to set,For the camera corresponding to reference picture n Posture, and dTFor the depth value obtained by trigonometric ratio.Plane parameter corresponding to super-pixel i in image n isθ is logical Cross the parameter for the description different images row characteristic statistics attribute that Ground Truth depth data learns, YnTo pass through single width Confidence level corresponding to the super-pixel plane parameter that characteristics of image is estimated,It refers to obtaining in image n by trigonometric ratio Approximate depth,Representative is set corresponding to the super-pixel plane parameter that is obtained in image n by single image feature assessment Reliability.
3. according to claim 2 be based on semantic priori and the wide baseline densification method for reconstructing three-dimensional scene of gradual optimization, It is characterized in that, the calculation formula of the first item of energy function are as follows:
Wherein,To indicate by SiPicture point s in the super-pixel i of a pixel compositioniFeature,To pass through three-dimensional point s on super-pixel i from camera photocentre transmittingiRay, θrTo pass through GroundTruth depth data The parameter for the description different images row characteristic statistics attribute that acquistion is arrived.
4. according to claim 2 be based on semantic priori and the wide baseline densification method for reconstructing three-dimensional scene of gradual optimization, It is characterized in that, the calculation formula of the Section 2 of energy function are as follows:
Wherein,Indicate two pixel siAnd sjBetween relative distance.
5. according to claim 2 be based on semantic priori and the wide baseline densification method for reconstructing three-dimensional scene of gradual optimization, It is characterized in that, the calculation formula of the Section 3 of energy function are as follows:
In above formula, it is assumed that there are J in image m and nmnA matching double points,
And
6. according to claim 2 be based on semantic priori and the wide baseline densification method for reconstructing three-dimensional scene of gradual optimization, It is characterized in that, the calculation formula of the Section 4 of energy function are as follows:
In image n, there is KnA point can obtain approximate depth d by trigonometric ratioT
7. according to claim 1 be based on semantic priori and the wide baseline densification method for reconstructing three-dimensional scene of gradual optimization, It is characterized in that, in step s 2, using Markov random field model obtain in image the initial depth of each super-pixel and After the positional relationship that offsets between different perspectives image, using high-level semantic priori come respectively as the New Testament between super-pixel Beam recycles non-formaldehyde finishing method to realize region merging technique.
8. according to claim 7 be based on semantic priori and the wide baseline densification method for reconstructing three-dimensional scene of gradual optimization, It is characterized in that, realizing region merging technique based on weighting function W (i, j) and using non-formaldehyde finishing method, in which:
In above formula, θijFor the respective normal between two neighbouring super pixels i and jWithFolder Angle, OBJijIt is poor by the semantic classes between the obtained adjacent i and j of linguistic indexing of pictures algorithm based on decision tree then to represent It is different, OBJ is artificially assigned if the two belongs to identical semantic categoryijLower weight, it is on the contrary then assign biggish weight, a1= a2=0.5.
CN201811157420.5A 2018-09-30 2018-09-30 Based on semantic priori and the wide baseline densification method for reconstructing three-dimensional scene of gradual optimization Withdrawn CN109255833A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811157420.5A CN109255833A (en) 2018-09-30 2018-09-30 Based on semantic priori and the wide baseline densification method for reconstructing three-dimensional scene of gradual optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811157420.5A CN109255833A (en) 2018-09-30 2018-09-30 Based on semantic priori and the wide baseline densification method for reconstructing three-dimensional scene of gradual optimization

Publications (1)

Publication Number Publication Date
CN109255833A true CN109255833A (en) 2019-01-22

Family

ID=65045338

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811157420.5A Withdrawn CN109255833A (en) 2018-09-30 2018-09-30 Based on semantic priori and the wide baseline densification method for reconstructing three-dimensional scene of gradual optimization

Country Status (1)

Country Link
CN (1) CN109255833A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109903379A (en) * 2019-03-05 2019-06-18 电子科技大学 A kind of three-dimensional rebuilding method based on spots cloud optimization sampling
CN110009625A (en) * 2019-04-11 2019-07-12 上海科技大学 Image processing system, method, terminal and medium based on deep learning
CN110288712A (en) * 2019-03-30 2019-09-27 天津大学 The sparse multi-view angle three-dimensional method for reconstructing of indoor scene
CN110910437A (en) * 2019-11-07 2020-03-24 大连理工大学 Depth prediction method for complex indoor scene
CN111325347A (en) * 2020-02-19 2020-06-23 山东大学 Automatic danger early warning description generation method based on interpretable visual reasoning model
CN111414923A (en) * 2020-03-05 2020-07-14 南昌航空大学 Indoor scene three-dimensional reconstruction method and system based on single RGB image
CN111508010A (en) * 2019-01-31 2020-08-07 北京地平线机器人技术研发有限公司 Method and device for depth estimation of two-dimensional image and electronic equipment
CN113822919A (en) * 2021-11-24 2021-12-21 中国海洋大学 Underwater image relative depth estimation method based on semantic information constraint
CN115115797A (en) * 2022-08-25 2022-09-27 清华大学 Large-scene sparse light field semantic driving intelligent reconstruction method, system and device
CN115409886A (en) * 2022-11-02 2022-11-29 南京航空航天大学 Part geometric feature measuring method, device and system based on point cloud

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111508010A (en) * 2019-01-31 2020-08-07 北京地平线机器人技术研发有限公司 Method and device for depth estimation of two-dimensional image and electronic equipment
CN111508010B (en) * 2019-01-31 2023-08-08 北京地平线机器人技术研发有限公司 Method and device for estimating depth of two-dimensional image and electronic equipment
CN109903379A (en) * 2019-03-05 2019-06-18 电子科技大学 A kind of three-dimensional rebuilding method based on spots cloud optimization sampling
CN110288712A (en) * 2019-03-30 2019-09-27 天津大学 The sparse multi-view angle three-dimensional method for reconstructing of indoor scene
CN110009625A (en) * 2019-04-11 2019-07-12 上海科技大学 Image processing system, method, terminal and medium based on deep learning
CN110009625B (en) * 2019-04-11 2021-02-12 上海科技大学 Image processing system, method, terminal and medium based on deep learning
CN110910437A (en) * 2019-11-07 2020-03-24 大连理工大学 Depth prediction method for complex indoor scene
CN110910437B (en) * 2019-11-07 2021-11-05 大连理工大学 Depth prediction method for complex indoor scene
CN111325347A (en) * 2020-02-19 2020-06-23 山东大学 Automatic danger early warning description generation method based on interpretable visual reasoning model
CN111414923A (en) * 2020-03-05 2020-07-14 南昌航空大学 Indoor scene three-dimensional reconstruction method and system based on single RGB image
CN111414923B (en) * 2020-03-05 2022-07-12 南昌航空大学 Indoor scene three-dimensional reconstruction method and system based on single RGB image
CN113822919A (en) * 2021-11-24 2021-12-21 中国海洋大学 Underwater image relative depth estimation method based on semantic information constraint
CN113822919B (en) * 2021-11-24 2022-02-25 中国海洋大学 Underwater image relative depth estimation method based on semantic information constraint
CN115115797A (en) * 2022-08-25 2022-09-27 清华大学 Large-scene sparse light field semantic driving intelligent reconstruction method, system and device
CN115409886A (en) * 2022-11-02 2022-11-29 南京航空航天大学 Part geometric feature measuring method, device and system based on point cloud

Similar Documents

Publication Publication Date Title
CN109255833A (en) Based on semantic priori and the wide baseline densification method for reconstructing three-dimensional scene of gradual optimization
CN111815757B (en) Large member three-dimensional reconstruction method based on image sequence
Dong et al. An efficient global energy optimization approach for robust 3D plane segmentation of point clouds
US11816907B2 (en) Systems and methods for extracting information about objects from scene information
Johnson et al. Registration and integration of textured 3D data
Ulusoy et al. Semantic multi-view stereo: Jointly estimating objects and voxels
Campbell et al. Globally-optimal inlier set maximisation for camera pose and correspondence estimation
Campbell et al. Automatic object segmentation from calibrated images
Lu et al. A survey of motion-parallax-based 3-D reconstruction algorithms
Holzmann et al. Semantically aware urban 3d reconstruction with plane-based regularization
Li et al. ADR-MVSNet: A cascade network for 3D point cloud reconstruction with pixel occlusion
Häne et al. Hierarchical surface prediction
Liu et al. High-quality textured 3D shape reconstruction with cascaded fully convolutional networks
Furukawa High-fidelity image-based modeling
CN105590327A (en) Motion estimation method and apparatus
Guo et al. Line-based 3d building abstraction and polygonal surface reconstruction from images
Owens et al. Shape anchors for data-driven multi-view reconstruction
Gallup Efficient 3D reconstruction of large-scale urban environments from street-level video
Laupheimer et al. Juggling with representations: On the information transfer between imagery, point clouds, and meshes for multi-modal semantics
Guo et al. Improved marching tetrahedra algorithm based on hierarchical signed distance field and multi-scale depth map fusion for 3D reconstruction
Liu et al. A Stochastic Image Grammar for Fine-Grained 3D Scene Reconstruction.
Ton et al. 3D Least Squares Based Surface Reconstruction
Rolin et al. View synthesis for pose computation
Yang Robust method in photogrammetric reconstruction of geometric primitives in solid modeling
Jančošek Large Scale Surface Reconstruction based on Point Visibility

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20190122