Detailed Description
Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.
Fig. 1 shows a schematic flowchart of a depth reconstruction method based on superpixel relationship analysis according to an embodiment of the present disclosure. As shown in fig. 1, the method may include:
step S11, determining a feature point matching result M between adjacent image frames in the image frame set and a super-pixel segmentation result S of the target image frame, where the image frame set includes at least two image frames, the target image frame is one frame in the image frame set, and the super-pixel segmentation result S includes a plurality of super-pixels.
Step S12, according to the feature point matching result M and the super-pixel segmentation result S, determining the homography matrix H corresponding to each super-pixel and the motion relation R between the adjacent super-pixelse。
Step S13, according to the motion relation R between the adjacent super pixelseDetermining the spatial relationship R between adjacent superpixelss。
Step S14, according to the spatial relation R between the adjacent super-pixelssThe plane parameter θ for each superpixel is determined.
And step S15, performing depth reconstruction on the dynamic scene corresponding to the image frame set according to the homography matrix H corresponding to each super pixel and the plane parameter theta of each super pixel.
The spatial relationship of the super-pixels is determined through the motion relationship between the super-pixels, so that the plane parameter of each super-pixel can be determined, and the depth reconstruction of a complex motion scene is effectively realized.
Fig. 2 shows a schematic diagram of a depth reconstruction system based on superpixel relationship analysis according to an embodiment of the present disclosure. As shown in fig. 2, the depth reconstruction system includes five modules: the device comprises a preprocessing module, a relation analysis module, a motion selection module, a reconstruction module and an optimization module.
Inputting at least two image frames included in the image frame set into a preprocessing module, preprocessing the at least two image frames by the preprocessing module, and outputting a feature point matching result M between adjacent image frames and a super-pixel segmentation result S of a target image frame.
In one example, feature point matching results between adjacent image frames are determined by a CPM algorithm.
It should be noted that, the method for determining the feature point matching result M between adjacent image frames may also use other algorithms besides the CPM algorithm described above, and this disclosure is not limited in this respect.
In an example, a superpixel segmentation result S of the target image frame is determined by the SLIC algorithm.
It should be noted that, the method for determining the super-pixel segmentation result S of the target image frame may also use other algorithms besides the SLIC algorithm described above, and this disclosure is not limited in this regard.
For the relationship between adjacent superpixels, two relationships are defined: kinematic relationships and spatial relationships. The motion relationship is related to the homography matrix H and can be obtained by point matching, and the spatial relationship is determined by the plane parameter theta. For the kinematic and spatial relationships, there are 3 relationship types each, defined as: coplanar, Hinge and Crack. Table 1 shows a specific definition of the kinematic relationships. Table 2 shows a specific definition of the spatial relationship.
TABLE 1
TABLE 2
Wherein Hi is the super pixel SiCorresponding homography matrix, thetaiIs a super pixel SiP is the canonical form of the pixel point P, Bi,jIs a neighboring super-pixel SiAnd SjA set of pixels on a common boundary.
Still taking the above fig. 2 as an example, the feature point matching result M between the adjacent image frames output by the preprocessing module and the super-pixel segmentation result S of the target image frame are input into the relationship analysis module, and the relationship analysis module determines the homography matrix H corresponding to each super-pixel and the motion relationship R between the adjacent super-pixels according to the feature point matching result M between the adjacent image frames determined by the preprocessing module and the super-pixel segmentation result S of the target image framee。
In one possible implementation, according to the feature point matching result M and the super-pixel segmentation result S, determining the homography matrix H corresponding to each super-pixel and the motion relationship R between adjacent super-pixelseThe method comprises the following steps: for any superpixel i, by applying a first energy function E (H, R)e) Optimizing to obtain a homography matrix H corresponding to the super pixel iiAnd the motion relation R between a superpixel i and an adjacent superpixel je(i, j); wherein the first energy function E (H, R)e) Comprises the following steps: homography matrix H corresponding to super pixeliRelated first optimization term Edata(Hi) Homography matrix H corresponding to a super pixeliThe motion relation R between a super-pixel i and an adjacent super-pixele(i, j) homography matrix H corresponding to adjacent pixel jjAssociated second optimization term Epair(Hi,Hj,Re(i, j)), and a motion relationship R with the super-pixel i and the neighboring super-pixele(i, j) related constant term Eo(Re(i,j))。
In one possible implementation, the homography matrix H corresponding to the adjacent pixel jjIs the optimized homography matrix.
For example, the first energy function E (H, R)e) Is expressed as follows:
determining a homography matrix H corresponding to the super pixel i by optimizing the first energy functioniAnd the motion relation R between a superpixel i and an adjacent superpixel je(i, j) E is the set of adjacent superpixel pairs corresponding to superpixel i, and λs1And λs2Are the weight coefficients.
At a first energy function E (H, R)e) In (1), a homography matrix H corresponding to the super pixeliRelated first optimization term Edata(Hi) The specific representation form for matching the feature point matching result M is as follows:
wherein, | ZiI is a normalization parameter, Su(Si) Is the support matching of the super-pixel i, taugIs the threshold parameter, ωcThe specific representation form of (A) is as follows:
ωc(Si,pl)=exp(-ωs(Si,Sj)/γ),pl∈Sj。
wherein, ω issγ is a constant parameter for the geodesic distance between a superpixel i and the center of a superpixel j.
At a first energy function E (H, R)e) In (1), a homography matrix H corresponding to the super pixeliThe motion relation R between a super-pixel i and an adjacent super-pixele(i, j) homography matrix H corresponding to adjacent pixel jjAssociated second optimization term Epair(Hi,Hj,Re(i, j)) are represented in the following specific forms:
at a first energy function E (H, R)e) And the motion relation R between the super pixel i and the adjacent super pixele(i, j) related constant term Eo(Re(i, j)) is represented in the following specific form:
the first energy function E (H, R) is set to be equal to or higher than a predetermined threshold when a plurality of image frames are included in the image frame sete) Further comprising: with the motion relation R between the super-pixel i and the adjacent super-pixele(i, j) related prior term Emul(Re(i, j)), at which time the first energy function E (H, R)e) Is expressed in the following specific form:
wherein the prior term Emul(Re(i, j)) is added according to the tracking superpixel relationship, and is expressed in the following form:
wherein R is
p(i, j) is a priori the tracked motion relationships,
indicates how many frames the relationship holds, τ
mIs a threshold parameter.
By optimizing the first energy function E (H, R)e) Determining that a super-pixel i corresponds toHomography matrix HiAnd the motion relation R between a superpixel i and an adjacent superpixel je(i, j) includes:
first, a homography matrix H is randomly initializedi. Wherein, the homography matrix H can be initialized through the characteristic point matching result MiThe homography matrix H may also be initialized by other methodsiThis disclosure is not particularly limited thereto.
Secondly, according to the optimized homography matrix H corresponding to the adjacent pixelsjDetermining a second optimization term Epair(Hi,Hj,Re(i, j)), a support vector that considers all point matches involved as local is optimized.
And finally, using a new support vector set to correspond to the homography matrix H by adopting a method of randomly generating model parameters according to a fast propagation methodiAnd (6) optimizing.
Wherein, the optimization process can be ended by setting the optimization times, or the homography matrix H can be usediThe optimization process is ended when convergence is reached, and the optimization process may also be ended in other ways, which is not specifically limited by the present disclosure.
Optimizing the first energy function E (H, R) by the abovee) The homography matrix H corresponding to each super pixel and the motion relation R between the adjacent super pixels can be obtainede。
In a possible implementation, the motion relation R between adjacent superpixels is used as a basiseDetermining the spatial relationship R between adjacent superpixelssThe method comprises the following steps: relating the motion between adjacent super-pixels to ReIs determined as a spatial relationship R between adjacent superpixelssThe relationship type includes: coplanar, Hinge and Crack.
In practical application, the motion relation and the control relation between the adjacent super pixels are in one-to-one correspondence, so that the motion relation R between the adjacent super pixels is obtainedeThen, the motion relation R between the adjacent super pixelseIs determined as a spatial relationship R between adjacent superpixelssThe type of relationship (c). I.e. phase inversionMotion relation R between adjacent super pixelseWhen the relationship type of (1) is Coplanar, the spatial relationship R thereofsThe type of relationship of (a) is also Coplanar; when the motion relation R between adjacent superpixelseWhen the relationship type of (1) is Hinge, the spatial relationship RsThe relationship type of (1) is also Hinge; when the motion relation R between adjacent super-pixelseWhen the relationship type of (1) is Crack, the spatial relationship R issIs also Crack.
Still taking the example of FIG. 2 above, the relationship analysis module outputs the homography H corresponding to each super-pixel and the spatial relationship R between adjacent super-pixelssTo the motion selection module.
In one possible implementation, the spatial relationship R between adjacent superpixels is based onsDetermining a plane parameter θ for each superpixel, comprising: determining a reference superpixel set S among a plurality of superpixels based on a preset algorithmtWherein the reference super pixel set StA background portion in the corresponding target image frame; reference super pixel set StDetermining the scale factor s of each reference superpixel as 1; from a reference set of superpixels StAnd the spatial relationship R between adjacent superpixelssThe plane parameter θ for each superpixel is determined.
Still taking the example of fig. 2 above, the motion selection module determines a reference super-pixel set S among the plurality of super-pixels based on a predetermined algorithmtAnd decomposing the homography matrix Hi corresponding to each super pixel to obtain the inverse depth d corresponding to each super pixeliAnd the normal vector n of the planeiAnd further outputs a reference super-pixel set StThe spatial relationship R between adjacent superpixelssAnd an inverse depth d for each super-pixeliAnd the normal vector n of the planeiTo the reconstruction module such that the reconstruction module determines a planar parameter θ for each superpixel.
In one possible implementation, the set S of superpixels is based on a referencetAnd the spatial relationship R between adjacent superpixelssDetermining a plane parameter θ for each superpixel, comprising: for any superpixel i, by optimizing the second energy function E (θ, s),obtaining the plane parameter theta of the superpixel iiAnd a scale factor si(ii) a Wherein, the second energy function E (θ, s) includes: with the plane parameter theta of the super-pixel iiAnd a scale factor siRelated third optimization term Efit(θi,si) And the plane parameter theta of the super pixel iiPlane parameter theta of adjacent super pixel j corresponding to super pixel ijRelated fourth optimization term ERel(θi,θj) And scale factor s with super pixel iiAssociated constant term Eocc(si)。
In one possible implementation, the motion relationship R between a superpixel i and all neighboring superpixels jsWhen the relationship types of (i, j) are all Crack, the second energy function E (θ, s) further includes: with the plane parameter theta of the super-pixel iiPlane parameter theta of super pixel kkAssociated fifth optimization term Epari (θ)i,θk) Wherein the super-pixel k is a super-pixel set SrOther super-pixels, super-pixel sets SrCan reach the reference superpixel set S based on the target pathtAnd the relation type of the superpixel relation on the target path is Coplanar or Hinge.
In a possible implementation, the plane parameter θ of the neighboring superpixel jjAnd the plane parameter theta of the superpixel kkThe optimized plane parameters.
For example, the second energy function E (θ, s) is expressed as follows:
wherein λ isr1、λr2And λr3Are the weight coefficients.
In the second energy function E (theta, s), the plane parameter theta of the super-pixel iiPlane parameter theta of adjacent super pixel j corresponding to super pixel ijRelated fourth optimization term ERel(θi,θj) In the concrete form of representationThe following were used:
in the second energy function E (θ, s), the scale factor s associated with the super-pixel iiAssociated constant term Eocc(si) The specific representation form of (A) is as follows:
Eocc(si)=δ(si≠1)。
wherein the content of the first and second substances,
in the second energy function E (theta, s), the plane parameter theta of the super-pixel iiAnd a scale factor siRelated third optimization term Efit(θi,si) Mainly used for maintaining local characteristics, the specific representation form is as follows:
wherein, taurIs a threshold parameter.
Definition of SrRepresenting all the underlying super-pixel sets StThe super pixel set that can be achieved. For one not in StWe define that it can reach StIf and only if we can find a target path from the superpixel to StAnd the relation type of the superpixel relation on the target path is Coplanar or Hinge, and at this time, the following is defined:
Ec={(i,k)|(i,k)∈E,Si∈Sr,Sk∈S\Sr}。
in the second energy function E (theta, s), the plane parameter theta of the super-pixel iiPlane parameter theta of super pixel kkRelated fifth optimization term Epair(θi,θk) The specific expression form of (A) is as follows:
according to the fifth optimization term Epair(θi,θk) It can be ensured that the plane parameters of all superpixels apply to the second energy function E (θ, s) described above.
Determining a planar parameter θ of the superpixel i by optimizing said second energy function E (θ, s)iAnd a scale factor siThe method comprises the following steps:
first, a plane parameter θ is randomly initializedi。
Secondly, according to the optimized homography matrix theta corresponding to the adjacent pixelsjAnd thetakDetermining a fourth optimization term ERel(θi,θj) And a fifth optimization term Epari(θi,θk)。
Finally, according to a fast propagation method, a method for randomly generating model parameters is adopted to carry out plane parameter thetaiAnd (6) optimizing.
Wherein, the optimization process can be ended by setting the optimization times, or the plane parameter theta can be setiThe optimization process is ended when convergence is reached, and the optimization process may also be ended in other ways, which is not specifically limited by the present disclosure.
In one possible implementation manner, performing depth reconstruction on a dynamic scene corresponding to an image frame set according to a homography matrix H corresponding to each super pixel and a plane parameter θ of each super pixel includes: decomposing the homography matrix H corresponding to each super pixel to obtain the camera rotation parameter R corresponding to each super pixel0And a camera translation parameter t0(ii) a According to the camera rotation parameter R corresponding to each super pixel0And a camera translation parameter t0And the plane parameter theta of each super pixel is used for constructing a dense depth map of the dynamic scene corresponding to the image frame set.
Still taking the example of fig. 2 above, the motion selection module decomposes the homography matrix Hi corresponding to each super-pixel and obtains the camera rotation corresponding to each super-pixelParameter R0And a camera translation parameter t0And output to a reconstruction module, which is finally based on the camera rotation parameter R corresponding to each superpixel0And a camera translation parameter t0And a plane parameter theta of each super pixel, constructing a dense depth map D of the dynamic scene corresponding to the image frame set, and further optimizing by an optimization module to obtain an optimized dense depth map DR. Fig. 3 shows a schematic diagram of the result of depth reconstruction using two consecutive image frames according to an embodiment of the present disclosure.
Determining a feature point matching result M between adjacent image frames in an image frame set comprising at least two image frames and a super-pixel segmentation result S of a target image frame, wherein the target image frame is one frame in the image frame set, the super-pixel segmentation result S comprises a plurality of super-pixels, and determining a homography matrix H corresponding to each super-pixel and a motion relation R between the adjacent super-pixels according to the feature point matching result M and the super-pixel segmentation result SeAccording to the motion relation R between adjacent superpixelseDetermining the spatial relationship R between adjacent superpixelssAccording to the spatial relationship R between adjacent superpixelssAnd determining a plane parameter theta of each super pixel, and performing depth reconstruction on the dynamic scene corresponding to the image frame set according to the homography matrix H corresponding to each super pixel and the plane parameter theta of each super pixel. Therefore, the spatial relationship of the super pixels is determined through the motion relationship of the super pixels, so that the plane parameter of each super pixel can be determined, and the deep reconstruction of a complex motion scene is effectively realized.
Fig. 4 shows a schematic structural diagram of a depth reconstruction apparatus based on superpixel relationship analysis according to an embodiment of the present disclosure. As shown in fig. 4, the apparatus 40 includes:
a first determining module 41, configured to determine a feature point matching result M between adjacent image frames in an image frame set and a super-pixel segmentation result S of a target image frame, where the image frame set includes at least two image frames, the target image frame is one frame in the image frame set, and the super-pixel segmentation result S includes multiple super-pixels;
a second determining module 42, configured to determine a homography H corresponding to each super pixel and a motion relation R between adjacent super pixels according to the feature point matching result M and the super pixel segmentation result Se;
A third determination module 43 for determining the motion relation R between adjacent superpixelseDetermining the spatial relationship R between adjacent superpixelss;
A fourth determination module 44 for determining the spatial relationship R between adjacent superpixelssDetermining a plane parameter theta of each super pixel;
and the reconstruction module 45 is configured to perform depth reconstruction on the dynamic scene corresponding to the image frame set according to the homography matrix H corresponding to each super pixel and the plane parameter θ of each super pixel.
In one possible implementation, the second determining module 42 is specifically configured to:
for any superpixel i, by applying a first energy function E (H, R)e) Optimizing to obtain a homography matrix H corresponding to the super pixel iiAnd the motion relation R between a superpixel i and an adjacent superpixel je(i,j);
Wherein the first energy function E (H, R)e) Comprises the following steps: homography H corresponding to a super-pixel iiRelated first optimization term Edata(Hi) Homography matrix H corresponding to a super-pixel iiThe motion relation R between a super-pixel i and an adjacent super-pixel je(i, j) homography matrix H corresponding to adjacent pixel jjAssociated second optimization term Epair(Hi,Hj,Re(i, j)), and a motion relationship R with the super-pixel i and the neighboring super-pixel je(i, j) related constant term Eo(Re(i,j))。
In one possible implementation, the homography matrix H corresponding to the adjacent pixel jjIs the optimized homography matrix.
In a possible implementation manner, the third determining module 43 is specifically configured to:
relating the motion between adjacent superpixels to ReDetermination of the type of relationshipAs a spatial relationship R between adjacent superpixelssThe relationship type includes: coplanar, Hinge and Crack.
In one possible implementation, the fourth determining module 44 includes:
a first determination submodule for determining a set S of reference superpixels among the plurality of superpixels on the basis of a preset algorithmtWherein the reference super pixel set StA background portion in the corresponding target image frame;
a second determining submodule for determining the reference superpixel set StDetermining the scale factor s of each reference superpixel as 1;
a third determining submodule for determining a reference set S of superpixels fromtAnd the spatial relationship R between adjacent superpixelssThe plane parameter θ for each superpixel is determined.
In a possible implementation manner, the third determining submodule is specifically configured to:
for any superpixel i, the plane parameter theta of the superpixel i is obtained by optimizing the second energy function E (theta, s)iAnd a scale factor si;
Wherein, the second energy function E (θ, s) includes: with the plane parameter theta of the super pixel iiAnd scale factor siRelated third optimization term Efit(θi,si) And the plane parameter theta of the super pixel iiPlane parameter theta of adjacent super pixel j corresponding to super pixel ijRelated fourth optimization term ERel(θi,θj) And scale factor s with super pixel iiAssociated constant term Eocc(si)。
In one possible implementation, the motion relationship R between a superpixel i and all neighboring superpixels jsWhen the relationship types of (i, j) are all Crack, the second energy function E (θ, s) further includes: with the plane parameter theta of the super-pixel iiPlane parameter theta of super pixel kkAssociated fifth optimization term Epair(θi,θk) Wherein the super-pixel k is a super-pixel set SrOther super-pixels, super-pixel sets SrCan reach the reference super-pixel set S based on the target pathtAnd the relation type of the superpixel relation on the target path is Coplanar or Hinge.
In a possible implementation, the plane parameter θ of the neighboring superpixel jjAnd the plane parameter theta of the superpixel kkThe optimized plane parameters.
In one possible implementation, the reconstruction module 45 is specifically configured to:
decomposing the homography matrix H corresponding to each super pixel to obtain the camera rotation parameter R corresponding to each super pixel0And a camera translation parameter t0;
According to the camera rotation parameter R corresponding to each super pixel0And a camera translation parameter t0And the plane parameter theta of each super pixel is used for constructing a dense depth map of the dynamic scene corresponding to the image frame set.
The apparatus 40 provided in the present disclosure can implement each step in the method embodiment shown in fig. 1 and/or fig. 2, and implement the same technical effect, and for avoiding repetition, details are not described here again.
Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terms used herein were chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the techniques in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.