CN114677423A - Indoor space panoramic depth determination method and related equipment - Google Patents

Indoor space panoramic depth determination method and related equipment Download PDF

Info

Publication number
CN114677423A
CN114677423A CN202210225132.9A CN202210225132A CN114677423A CN 114677423 A CN114677423 A CN 114677423A CN 202210225132 A CN202210225132 A CN 202210225132A CN 114677423 A CN114677423 A CN 114677423A
Authority
CN
China
Prior art keywords
depth
loss
panoramic
target
indoor space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210225132.9A
Other languages
Chinese (zh)
Inventor
王旭
孔伟锋
张秋丹
邬文慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Original Assignee
Shenzhen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University filed Critical Shenzhen University
Priority to CN202210225132.9A priority Critical patent/CN114677423A/en
Publication of CN114677423A publication Critical patent/CN114677423A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention discloses a method for determining the panoramic depth of an indoor space and related equipment. The method comprises the following steps: acquiring a panoramic image of the indoor space; extracting the features of the panoramic image based on a lightweight backbone network CoordNet to obtain a preliminary prediction depth map; obtaining loss information of the preliminary prediction depth map, wherein the loss information comprises significant direction normal loss and plane consistency depth loss; and optimizing the preliminary prediction depth map based on the loss information to determine the indoor space panoramic depth. The method provided by the application can make full use of the abundant structure and geometric information of the indoor scene, solve the problem of back propagation of the weak texture area of the indoor scene, and improve the accuracy of panoramic depth estimation.

Description

Indoor space panoramic depth determination method and related equipment
Technical Field
The invention relates to the technical field of panoramic images, in particular to a method for determining the panoramic depth of an indoor space and related equipment.
Background
The indoor scene can be captured or reconstructed rapidly and at low cost through the panoramic image, the panoramic depth estimation is one of basic tasks for understanding the indoor scene, and the predicted depth result can be used for downstream tasks such as automatic driving, augmented reality and three-dimensional reconstruction. The depth estimation task is to estimate the depth of a scene in a panoramic image, namely the distance from each pixel point to a camera. Conventional depth estimation methods need to rely on multiple images, such as motion recovery structure and stereo visual matching. These methods all rely on matching of feature points, and the predicted depth map is sparse. With the rapid development of the deep learning technology, monocular depth estimation based on deep learning is widely researched, and has higher prediction precision.
However, a lot of depth estimation work is designed based on perspective images, and since perspective images have smaller fields of view than panoramic images, if these methods are directly used for estimating the depth of panoramic images, poor estimation results will be caused. Unlike a perspective image, a panoramic image has a larger field of view, but has a larger projection distortion. All methods of (2) are fully supervised, requiring accurate true depth labeling. Access to these deep tags is difficult and requires significant labor and cost.
In order to solve the problem of depth label acquisition, an automatic supervision depth estimation method is rapidly developed. The self-supervision depth estimation method is to use the consistency loss of the luminosity, namely to supervise the difference between the reconstructed RGB image and the target RGB image, and to bypass the requirement of a real depth map label. However, the existing self-supervision panoramic depth estimation network only considers how to process the distortion of the panoramic image, but the indoor panoramic depth estimation still has the problem of back propagation of weak texture areas.
Disclosure of Invention
In this summary, concepts in a simplified form are introduced that are further described in the detailed description. This summary of the invention is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
In order to solve the above problem, in a first aspect, the present invention provides a method for determining a panoramic depth of an indoor space, where the method includes:
acquiring a panoramic image of the indoor space;
extracting the features of the panoramic image based on a lightweight backbone network CoordNet to obtain a preliminary prediction depth map;
obtaining loss information of the preliminary prediction depth map, wherein the loss information comprises significant direction normal loss and plane consistency depth loss;
and optimizing the preliminary prediction depth map based on the loss information to determine the indoor space panoramic depth.
Optionally, the method further includes:
according to the normals of all pixel points in the preliminary prediction depth map:
taking the significant direction with the maximum cosine similarity of the normal of the pixel points as a target significant direction;
aligning the normal of the pixel point with the target salient direction;
and constructing the target obvious direction normal loss by taking the normal of the pixel point as a supervision signal.
Optionally, the method further includes:
acquiring the optimal vanishing point direction and the predicted normal direction of the panoramic image;
and taking the optimal vanishing point direction and the predicted normal direction as the target significant direction.
Optionally, the method further includes:
acquiring the plane depth of each pixel point in a target three-dimensional plane, wherein the plane depth is acquired in the target plane, and the target plane is acquired by fusing and screening the color information and the geometric information of the panoramic image;
determining the planar conformance depth loss based on the preliminary predicted depth map and the planar depth.
Optionally, the method further includes:
acquiring three-dimensional coordinate information of each pixel point based on the preliminary prediction depth map to determine geometric information;
fusing and calculating the color information and the geometric information to obtain an edge map;
dividing the edge map to obtain a plurality of plane areas;
and taking the plane area with the pixel value exceeding the preset pixel as the target plane.
Optionally, the loss information further includes a spherical viewpoint generation loss;
the method further comprises the following steps:
obtaining the luminosity consistency loss of the target viewpoint;
acquiring a weighted value of the target viewpoint;
and acquiring the spherical viewpoint generation loss based on the luminosity consistency loss of the target viewpoint and the weighted value.
Optionally, the loss information further includes a smooth predicted depth loss;
The method further comprises the following steps:
the above-described smooth predicted depth loss is determined based on the three-dimensional cartesian coordinates of the target pixel and the enhancement weights.
In a second aspect, the present invention further provides an apparatus for determining a panoramic depth of an indoor space, including:
a first acquisition unit configured to acquire a panoramic image in the indoor space;
a second obtaining unit, configured to perform feature extraction on the panoramic image based on a lightweight backbone network CoordNet to obtain a preliminary predicted depth map;
a third obtaining unit, configured to obtain loss information of the preliminary prediction depth map, where the loss information includes a significant direction normal loss and a planar consistency depth loss;
a determining unit, configured to optimize the preliminary predicted depth map based on the loss information to determine the indoor space panoramic depth.
In a third aspect, an electronic device includes: a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor is configured to implement the steps of the indoor space panoramic depth determination method according to any one of the first aspect described above when the computer program stored in the memory is executed.
In a fourth aspect, the present invention also provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the indoor space panoramic depth determination method of any one of the above aspects of the first aspect.
The embodiment of the invention has the following beneficial effects:
after the method for determining the indoor space panoramic depth is adopted, firstly, the panoramic image is subjected to feature extraction through a lightweight backbone network CoordNet to obtain a preliminary prediction depth map, and the preliminary prediction depth map is optimized through loss information including target significant direction normal loss and plane consistency depth loss to obtain the indoor space panoramic depth. Abundant structure and geometric information of the indoor scene can be fully utilized, the problem of back propagation of weak texture areas of the indoor scene is solved, and accuracy of panoramic depth estimation is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Wherein:
fig. 1 is a schematic flow chart of a method for determining a panoramic depth of an indoor space according to an embodiment of the present application;
Fig. 2 is a schematic structural diagram of a panoramic depth estimation model according to an embodiment of the present disclosure;
fig. 3 is a schematic view of panoramic depth calculation effects of different panoramic depth determination methods according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an apparatus for determining a panoramic depth of an indoor space according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device for determining a panoramic depth of an indoor space according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments.
Referring to fig. 1, a schematic flow chart of a method for determining a panoramic depth of an indoor space provided in an embodiment of the present application may specifically include:
s110, acquiring a panoramic image of the indoor space;
for example, a panoramic image of an indoor space is obtained, and the manner of obtaining the panoramic image may be directly obtained by a camera, or may be performed based on multiple pictures, and the manner of obtaining the panoramic image is not limited herein.
S120, extracting features of the panoramic image based on a lightweight backbone network CoordNet to obtain a preliminary prediction depth map;
for example, in some existing methods, a commonly used equidistant cylindrical projection is adopted as a projection mode of an input panoramic image. In consideration of the distortion condition of the equidistant columnar projection image, the scheme adopts a lightweight backbone network CoordNet to extract the characteristics, and the network not only adopts an ELU activation function to replace a common RELU activation function and batch standardization processing, but also applies coordinate convolution operation. In order to learn the position information of the panoramic image, coordinate convolution adds coordinate information in both horizontal and vertical directions to the input feature map. A panoramic RGB image is given as input, and a preliminary prediction depth map is obtained by carrying out feature extraction through CoordNet.
S130, obtaining loss information of the preliminary prediction depth map, wherein the loss information comprises target significant direction normal loss and plane consistency depth loss;
illustratively, loss information of the panoramic image is acquired, and the loss information of the panoramic image comprises a target salient direction normal loss and a plane consistency depth loss, namely salient direction normal constraint and plane consistency depth constraint are carried out on the panoramic image. In the target salient direction normal constraint, a panoramic vanishing point of an input panoramic image is detected, then the salient direction in a scene is obtained, then the most similar salient direction (target salient direction) is found according to the predicted normal vector of each pixel point for alignment, and the aligned normal and the predicted normal calculation loss form the constraint. In the plane consistency depth constraint, the aligned normal information and the color information of the input panoramic image are used for panoramic plane area detection. The detected plane area can be converted into a three-dimensional plane according to the predicted depth information, then the depth between the plane area and the camera is calculated, and then constraint is formed by the calculated loss of the plane area and the original preliminary predicted depth map.
And S140, optimizing the preliminary prediction depth map based on the loss information to determine the indoor space panoramic depth.
Illustratively, the preliminary prediction depth map is optimized through loss information including significant direction normal loss and plane consistency depth loss, so that abundant structural and geometric information of an indoor scene can be fully utilized, the problem of back propagation of weak texture areas of the indoor scene is solved, and the accuracy of the panoramic depth determination method is improved as much as possible.
In summary, in the method provided by the embodiment of the present application, feature extraction is performed on the panoramic image through a lightweight backbone network CoordNet to obtain a preliminary predicted depth map, and the preliminary predicted depth map is optimized through loss information including significant direction normal loss and planar consistency depth loss to obtain an indoor space panoramic depth. Abundant structure and geometric information of the indoor scene can be fully utilized, the problem of back propagation of weak texture areas of the indoor scene is solved, and accuracy of panoramic depth estimation is improved.
In some examples, the method further comprises:
obtaining the normals of all pixel points in the preliminary prediction depth map:
taking the significant direction with the maximum cosine similarity of the normal of the pixel point as a target significant direction;
aligning the normal of the pixel point with the target salient direction;
And constructing the target obvious direction normal loss by taking the normal of the pixel point as a supervision signal.
Illustratively, the saliency direction refers to the normal direction of the main surface (floor, wall, ceiling) in the indoor scene satisfying the manhattan world assumption, and the application uses the target saliency direction as a supervision signal to align the normal of the pixel corresponding to the preliminary prediction depth map. In order to use the target saliency direction to constrain the predicted normals, a corresponding normal map is first computed from the model predicted depth map.
For the calculation of the pixel points, firstly, all the pixel points are projected to the corresponding three-dimensional coordinate P according to the predicted depth map, and the calculation is specifically carried out according to the following formula (1):
P=DpK-1p (1)
wherein the content of the first and second substances,
Figure BDA0003538909200000071
K-1representing an ERP inverse projection matrix, DpRepresenting the preliminary predicted depth for each pixel.
Calculating the normal n of each point according to the obtained three-dimensional coordinate Pp. Specifically, the normal vector of each pixel
Figure BDA0003538909200000072
Calculated from its 7 x 7 region neighbors.
Then finding the salient direction with the largest cosine similarity with the normal line of each pixel point is used for alignment. The normal line of each pixel point after alignment is recorded as
Figure BDA0003538909200000073
The specific calculation method is shown in formula (3):
Figure BDA0003538909200000074
wherein, the scheme only aligns the similarity of the normal line and the cosine of the obvious direction
Figure BDA0003538909200000075
Pixels exceeding the threshold γ. Therefore, the prominent directional mask MpDefined by formula (4):
Figure BDA0003538909200000076
as the normal predicted by the CoordNet model is full of noise in the initial stage of training, a smaller threshold value is set in the initial stage, so that more pixel points are restrained by more accurate significant directions. Then the threshold value is increased along with the training period, and the specific calculation mode is set as that gamma is equal to gamma1·Nepoch2Where γ 1 and γ 2 may be set to 1.663e, respectively-3And 0.9.
Finally, a significant directional normal loss is constructed. Each pixel in the saliency direction mask is subject to its x corresponding aligned normal
Figure BDA0003538909200000077
And (6) carrying out constraint. Using the aligned normal as a supervisory signal and constructing a loss function as shown in equation (5):
Figure BDA0003538909200000081
wherein N isvRepresenting mask M contained in the salient directionpThe number of pixels in (1).
In some examples, the method further comprises:
acquiring the optimal vanishing point direction and the predicted normal direction of the panoramic image;
and taking the optimal vanishing point direction and the predicted normal direction as the target significant direction.
For example, to obtain the salient direction of the indoor panoramic image, a vanishing point in the panoramic image is first detected. For example: given an equidistant cylindrical projection image with resolution wxh, its pixel coordinate p (u, v) can be directly mapped to the corresponding angular sphere coordinate p (phi, theta) by two formulas of phi 2 pi u/W and theta pi v/H. Since a straight line in the real world is projected as a curve in the panoramic image, a common vanishing point detecting method for the perspective image is not available. For panoramic projection, the straight line of the real world corresponds to a circular surface passing through the center of the sphere, so the embodiment expresses the straight line in each real world through the normal line of the corresponding circular surface.
Specifically, all edge point sets can be obtained by applying Canny edge detection operation to the input panoramic image, wherein each point can be mapped to three-dimensional coordinates according to the back projection of the equidistant columnar projection, and then each edge is represented by the outer product result of two random sampling points of the point set. Thus, each edge is transformed to the normal of the corresponding circular surface. After calculating the outer product of all any two parallel direct vanishing points, all possible vanishing point results can be obtained. Finally, the RANSAC algorithm can be used for screening out the optimal vanishing point from all possible vanishing point results. Since the normal direction of the model prediction is opposite to the vanishing point direction, the last significant direction includes the three detected vanishing point directions and their opposite directions
Figure BDA0003538909200000082
In some examples, the method further comprises:
acquiring the plane depth of each pixel point in a target three-dimensional plane, wherein the plane depth is acquired in the target plane, and the target plane is acquired by fusing and screening the color information and the geometric information of the panoramic image;
determining the planar conformance depth loss based on the preliminary predicted depth map and the planar depth.
Illustratively, the object planes obtained after the fusion and filtering of the color information and the geometric information of the panoramic image are each defined by the following equation (6):
XiAi=Y (6)
wherein
Figure BDA0003538909200000091
Three-dimensional coordinates of pixel points in the ith plane area,
Figure BDA0003538909200000092
plane parameters representing each three-dimensional plane, Y ═ 1, 1, …, 1]Representing an N-dimensional column vector with all values of 1. Plane parameter AiCan be obtained by the closed form of the least squares problem, as shown in equation (7):
Ai=(XX+∈E)-1X)Y) (7)
Where E represents a weighted identity matrix for numerical stability.
Depth of each pixel point in target plane
Figure BDA0003538909200000093
Can pass through its corresponding plane parameter AiCalculated, as shown in equation (8):
Figure BDA0003538909200000095
determining the planar conformance depth loss L by preliminarily predicting the depth map and the planar depth by using the formula (9)plane
Figure BDA0003538909200000094
Wherein N isplaneRepresenting the sum of the number of pixels in all planar regions.
In some examples, the method further comprises:
acquiring three-dimensional coordinate information of each pixel point based on the preliminary prediction depth map to determine geometric information;
fusing and calculating the color information and the geometric information to obtain an edge map;
dividing the edge map to obtain a plurality of plane areas;
And taking the plane area with the pixel value exceeding the preset pixel as the target plane.
Illustratively, the plane consistency depth constraint means that the three-dimensional coordinates of all the pixel points are constrained by the corresponding three-dimensional plane. This constraint can solve the problem of back propagation of weak texture regions. At the same time, it can solve the disadvantage that significant directional normal constraints cannot act on the inclined plane. The three-dimensional plane is obtained by firstly obtaining a two-dimensional plane through a plane detection algorithm and then converting the two-dimensional plane into a three-dimensional space according to the predicted depth.
Specifically, in order to obtain a planar area of the panoramic image, all pixel points are converted into three-dimensional coordinates according to the predicted depth. And then fusing the color information and the geometric information of the panoramic image, calculating to obtain an edge image, obtaining all possible plane areas of the edge image by adopting a Felzenzwasl superpixel segmentation algorithm, and taking the plane areas with pixel values exceeding preset pixels as the target plane. The preset pixel may be set to 200, for example.
In some examples, the loss information further includes a spherical viewpoint generation loss;
the method further comprises the following steps:
obtaining the luminosity consistency loss of the target viewpoint;
Acquiring a weighted value of the target viewpoint;
and acquiring the spherical viewpoint generation loss based on the luminosity consistency loss of the target viewpoint and the weighted value.
Illustratively, the target viewpoint is generated at the task of the spherical viewpoint
Figure BDA0003538909200000101
Obtained by DIBR rendering. The initial photometric consistency loss in the auto-supervised depth estimation is defined by equation (10):
Figure BDA0003538909200000102
where α is set to 0.85. Considering the projection distortion of the panoramic image, a spherical weighting matrix W is designed as shown in formula (11):
W(P)=cosθp (11)
spherical viewpoint generation loss
Figure BDA0003538909200000103
Defined as the spherical weighted mean luminance loss of all valid pixels, as shown in equation (12):
Figure BDA0003538909200000104
in some examples, the above loss information further includes a smooth predicted depth loss;
the method further comprises the following steps:
the above-described smooth predicted depth loss is determined based on the three-dimensional cartesian coordinates of the target pixel and the enhancement weights.
For example, the sliding prediction depth loss can be determined by equation (13):
Figure BDA0003538909200000105
where v ═ x, y, z is expressed as three-dimensional cartesian coordinates for each pixel, the weights are enhanced
Figure BDA0003538909200000106
Is used to enhance the smoothness of the more distorted regions.
In some examples, the proposed method by the present application proposes a panoramic depth model, as shown in fig. 2, the framework of the self-supervised indoor panoramic depth estimation model based on structural regularization consists of three main parts: 210: a panoramic depth estimation module; 220: a salient direction normal constraint module; 230: a planar conformance depth constraint module. Firstly, the panoramic image can predict a depth map through a panoramic depth estimation module. Each pixel point of the panoramic image can be converted into a three-dimensional coordinate according to the predicted depth, and then a normal vector of each point is calculated. In the constraint of normal in the significant direction, panoramic vanishing points of the input panoramic image are detected, the significant direction in the scene is obtained, then the most similar significant direction is found according to the predicted normal vector of each pixel point for alignment, and the aligned normal and the predicted normal calculation loss form the constraint. In the plane consistency depth constraint, the aligned normal information and the color information of the input panoramic image are used for panoramic plane area detection. The detected plane area can be converted into a three-dimensional plane according to the predicted depth information, then the depth between the plane area and the camera is calculated, and then constraint is formed with the original predicted depth calculation loss.
In some examples, the indoor panorama depth determination method proposed by the present application is evaluated on two disclosed indoor panorama RGB-D datasets. These two datasets are the Matterport3D and St and ford2D3D datasets in 3D60, respectively. A general reference index is used for evaluating the panoramic depth estimation effect. Absolute Relative Error, Sq Rel, RMSE, RMSLE, and subthreshold accuracyi<1.25i,i=1,2,3)。
The results of the model performance comparison using different panoramic depth determination methods are given in table 1, and it is clear that the model (Ours) corresponding to this solution method achieves better performance on Matterport3D and Stanford2D3D datasets than other reference models. Compared with other self-supervision panoramic depth estimation methods, the performance gap between the self-supervision panoramic depth estimation method and the best current fully-supervision panoramic depth estimation method is smaller. Table 1 presents experimental evaluation of the models and reference models on Matterport3D and Stanford2D3D database test sets.
Figure BDA0003538909200000121
TABLE 1
The upper part of table 1 gives experimental results of various panoramic depth estimation methods on the Matterport3D data set. Compared with other self-supervision reference models, the calculation model corresponding to the method has great improvement on all indexes. The lower part of table 1 shows the experimental results of various panoramic depth estimation methods on the Stanford2D3D data set. Similarly, compared with other self-supervision reference models, the calculation model corresponding to the method has the best performance on all indexes, particularly Sq Rel and R MSE. Meanwhile, the upper half of each data set represents the performance result of the fully supervised panoramic depth estimation method, and the lower half represents the performance result of the self-supervised panoramic depth estimation method. It can be seen that the current self-supervised and fully supervised panoramic depth estimation methods still have a small gap, and the self-supervised panoramic depth estimation model reduces the gap with the fully supervised method.
For further analysis, we visualized our method with the depth prediction results of two better performing reference models, and the results are shown in fig. 3, and the prediction effects of the panoramic depth estimation models corresponding to 3 different methods are visualized and compared with the Stanford2D3D data set in Matterport3D, including the panoramic depth estimation model (Ours) corresponding to the panoramic depth determination method proposed by the present solution. The input panoramic image (Color) is displayed in the first column, and the reference image (Ground-truth) is displayed in the last column. The method proposed by the scheme is compared with the models proposed by the Hu ang et al, the structDepth and the Zioulis et al. As can be seen from the visualization results in fig. 3, compared with the methods of StructDepth and Zioulis et al, proposed by Huang et al, the panoramic depth determination result proposed by the present scheme is closer to the real depth, especially in the plane area, such as the floor, the wall, the ceiling, etc. Meanwhile, the method provided by the scheme can predict more abundant details. These results all prove that the method for determining the panoramic depth in the self-supervision room can predict a more accurate panoramic depth map.
Referring to fig. 4, an embodiment of an indoor space panoramic depth determining apparatus in an embodiment of the present application may include:
A first acquisition unit 41 configured to acquire a panoramic image in the indoor space;
a second obtaining unit 42, configured to perform feature extraction on the panoramic image based on a lightweight backbone network CoordNet to obtain a preliminary predicted depth map;
a third obtaining unit 43, configured to obtain loss information of the preliminary prediction depth map, where the loss information includes a significant direction normal loss and a plane-consistency depth loss;
a determining unit 44, configured to optimize the preliminary predicted depth map based on the loss information to determine the indoor space panoramic depth.
As shown in fig. 5, an electronic device 300 is further provided in the embodiments of the present application, and includes a memory 310, a processor 320, and a computer program 311 stored on the memory 320 and executable on the processor, where the processor 320 executes the computer program 311 to implement the steps of any one of the methods for determining the panoramic depth of the indoor space.
Since the electronic device described in this embodiment is a device for implementing an indoor space panoramic depth determination apparatus in this embodiment, based on the method described in this embodiment, a person skilled in the art can understand a specific implementation manner of the electronic device of this embodiment and various variations thereof, so that how to implement the method in this embodiment by the electronic device is not described in detail herein, and as long as the person skilled in the art implements the device used in this embodiment, the scope of the present application is intended to be protected.
In a specific implementation, the computer program 311 may implement any of the embodiments corresponding to fig. 1 when executed by a processor.
It should be noted that, in the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to relevant descriptions of other embodiments for parts that are not described in detail in a certain embodiment.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Embodiments of the present application further provide a computer program product, which includes computer software instructions, when the computer software instructions are run on a processing device, cause the processing device to execute a flow of the method for determining an indoor space panoramic depth in the embodiment corresponding to fig. 1.
The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are all or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. Computer-readable storage media can be any available media that a computer can store or a data storage device, such as a server, data center, etc., that includes one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.
It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application, which are essential or part of the technical solutions contributing to the prior art, or all or part of the technical solutions, may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (10)

1. A method for determining the panoramic depth of an indoor space is characterized by comprising the following steps:
acquiring a panoramic image of the indoor space;
performing feature extraction on the panoramic image based on a lightweight backbone network CoordNet to obtain a preliminary prediction depth map;
obtaining loss information of the preliminary prediction depth map, wherein the loss information comprises target significant direction normal loss and plane consistency depth loss;
optimizing the preliminary predicted depth map based on the loss information to determine the indoor space panoramic depth.
2. The method of claim 1, further comprising:
obtaining normals of all pixel points in the preliminary prediction depth map:
taking the significant direction with the maximum cosine similarity of the normal of the pixel point as a target significant direction;
aligning the normal of the pixel point with the target salient direction;
and constructing the target obvious direction normal loss by taking the normal of the pixel point as a supervision signal.
3. The method of claim 2, further comprising:
acquiring the optimal vanishing point direction and the predicted normal direction of the panoramic image;
and taking the optimal vanishing point direction and the predicted normal direction as the target significant direction.
4. The method of claim 1, further comprising:
acquiring the plane depth of each pixel point in a target three-dimensional plane, wherein the plane depth is acquired in the target plane, and the target plane is acquired by fusing and screening the color information and the geometric information of the panoramic image;
determining the planar conformance depth loss based on the preliminary predicted depth map and the planar depth.
5. The method of claim 4, further comprising:
acquiring three-dimensional coordinate information of each pixel point based on the preliminary prediction depth map to determine geometric information;
fusing and calculating the color information and the geometric information to obtain an edge map;
dividing the edge map to obtain a plurality of plane areas;
and taking the plane area with the pixel value exceeding a preset pixel as the target plane.
6. The method of claim 1, wherein the loss information further comprises a spherical viewpoint generation loss;
The method further comprises the following steps:
obtaining the luminosity consistency loss of the target viewpoint;
acquiring a weighted value of the target viewpoint;
and acquiring the spherical viewpoint generation loss based on the luminosity consistency loss of the target viewpoint and the weighted value.
7. The method of claim 1, wherein the loss information further comprises a smooth predicted depth loss;
the method further comprises the following steps:
determining the smooth predicted depth loss based on the three-dimensional Cartesian coordinates of the target pixel and the enhancement weights.
8. An apparatus for determining a panoramic depth of an indoor space, comprising:
a first acquisition unit configured to acquire a panoramic image in the indoor space;
a second obtaining unit, configured to perform feature extraction on the panoramic image based on a lightweight backbone network CoordNet to obtain a preliminary predicted depth map;
a third obtaining unit, configured to obtain loss information of the preliminary prediction depth map, where the loss information includes a significant direction normal loss and a planar consistency depth loss;
a determining unit to optimize the preliminary predicted depth map based on the loss information to determine the indoor space panorama depth.
9. An electronic device, comprising: memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor is adapted to implement the steps of the indoor space panoramic depth determination method according to any of claims 1-7 when executing the computer program stored in the memory.
10. A computer-readable storage medium on which a computer program is stored, which, when being executed by a processor, implements the indoor space panoramic depth determination method according to any one of claims 1 to 7.
CN202210225132.9A 2022-03-09 2022-03-09 Indoor space panoramic depth determination method and related equipment Pending CN114677423A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210225132.9A CN114677423A (en) 2022-03-09 2022-03-09 Indoor space panoramic depth determination method and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210225132.9A CN114677423A (en) 2022-03-09 2022-03-09 Indoor space panoramic depth determination method and related equipment

Publications (1)

Publication Number Publication Date
CN114677423A true CN114677423A (en) 2022-06-28

Family

ID=82072688

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210225132.9A Pending CN114677423A (en) 2022-03-09 2022-03-09 Indoor space panoramic depth determination method and related equipment

Country Status (1)

Country Link
CN (1) CN114677423A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115297316A (en) * 2022-08-11 2022-11-04 杭州电子科技大学 Virtual viewpoint synthetic image hole filling method with context feature fusion

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115297316A (en) * 2022-08-11 2022-11-04 杭州电子科技大学 Virtual viewpoint synthetic image hole filling method with context feature fusion

Similar Documents

Publication Publication Date Title
KR102319177B1 (en) Method and apparatus, equipment, and storage medium for determining object pose in an image
AU2016349518B2 (en) Edge-aware bilateral image processing
JP7453470B2 (en) 3D reconstruction and related interactions, measurement methods and related devices and equipment
CN109961506B (en) Local scene three-dimensional reconstruction method for fusion improved Census diagram
CN108895981B (en) Three-dimensional measurement method, device, server and storage medium
Hoiem et al. Geometric context from a single image
Choi et al. Depth analogy: Data-driven approach for single image depth estimation using gradient samples
Fang et al. A novel superpixel-based saliency detection model for 360-degree images
CN110910437B (en) Depth prediction method for complex indoor scene
CN108470178B (en) Depth map significance detection method combined with depth credibility evaluation factor
Albanis et al. Pano3d: A holistic benchmark and a solid baseline for 360 depth estimation
GB2566443A (en) Cross-source point cloud registration
CN110827320A (en) Target tracking method and device based on time sequence prediction
CN112686952A (en) Image optical flow computing system, method and application
Pan et al. Depth map completion by jointly exploiting blurry color images and sparse depth maps
WO2022247126A1 (en) Visual localization method and apparatus, and device, medium and program
Liu et al. HVS-based perception-driven no-reference omnidirectional image quality assessment
Ma et al. Segmentation-based stereo matching using combinatorial similarity measurement and adaptive support region
CN108447084B (en) Stereo matching compensation method based on ORB characteristics
CN114677423A (en) Indoor space panoramic depth determination method and related equipment
Yin et al. [Retracted] Virtual Reconstruction Method of Regional 3D Image Based on Visual Transmission Effect
CN111696167A (en) Single image super-resolution reconstruction method guided by self-example learning
CN112085842A (en) Depth value determination method and device, electronic equipment and storage medium
JP2023065296A (en) Planar surface detection apparatus and method
Kim et al. Data-driven single image depth estimation using weighted median statistics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination